Sunday, August 25, 2024

Dusun DSOM-042R: A Rockchip RK3588M System-on-Module for Automotive AIoT Applications

 


Dusun has introduced the DSOM-042R, a robust system-on-module (SoM) based on the Rockchip RK3588M, an automotive-grade AI SoC. Designed to handle the demanding environments of automotive AIoT applications, the module comes with 8GB of RAM, 128GB of eMMC flash storage, and operates in temperatures ranging from -40°C to 85°C. The module features high-density connectors that expose numerous interfaces supported by the octa-core Cortex-A76/A55 processor.

Key Features of the RK3588M SoC:

  • CPU:
    • 4x Cortex-A76 cores @ up to 2.1 GHz
    • 4x Cortex-A55 cores @ up to 1.7 GHz
  • GPU:
    • Arm Mali-G610 MP4 with support for OpenGL ES 3.2, OpenCL 2.2, Vulkan 1.1
  • NPU:
    • 6 TOPS AI accelerator
  • VPU:
    • Video Decoding: 8Kp60 H.265/VP9/AVS2, 8Kp30 H.264 AVC/MVC, 4Kp60 AV1, 1080p60 MPEG-2/-1/VC-1/VP8
    • Video Encoding: 8Kp30 H.265/H.264
    • Capable of up to 32-channel 1080p30 decoding and 16-channel 1080p30 encoding

Dusun RK3588M SoM Specifications:

  • System Memory:
    • 8GB RAM
  • Storage:
    • 128GB eMMC flash
    • 1x 100-pin B2B connector
    • 3x 80-pin B2B connectors
  • Video Output:
    • HDMI 2.1 up to 8Kp60 or 4Kp120
    • HDMI 2.0 up to 4Kp60
    • 2x MIPI DSI display interfaces up to 4Kp60
    • 2x DisplayPort 1.4 up to 8Kp30fps (multiplexed with USB 3.0)
    • 2x eDP1.3 connectors up to 4Kp60
    • BT.1120 up to 1080p60
    • Supports up to seven displays
  • Video Input:
    • 1x 4-lane MIPI CSI or 2x 2-lane MIPI CSI
    • 2x MIPI DC (4-channel DPHY v2.0 or 3-channel CPHY V1.1)
    • DVP camera interface up to 150MHz input data
  • Audio:
    • 2x 8-channel I2S
    • 2x 2-channel I2S
    • 2x SPDIF
    • 2x 8-channel PDM (supports multiple MIC arrays)
    • Dual-channel digital audio codec (16-bit DAC)
    • VAD (Voice Activity Detection)
  • Networking:
    • Gigabit Ethernet
  • USB:
    • 3x USB 3.0
    • 4x USB 2.0
    • 2x USB 2.0 OTG
  • High-Speed Interfaces:
    • PCIe Gen 3.0 (x4, or 2x x2, or 4x x1)
    • 3x PCIe Gen 2.0 x1
  • Low-Speed I/Os:
    • 9x I2C
    • 10x UART
    • 5x SPI
    • 16x PWM
    • GPIOs
  • Analog:
    • 7x ADC
  • Power:
    • Supply Voltage: 5V
    • Power Consumption:
      • Idle: 1.35W
      • Typical: 4.8W
      • Max: 20W
  • Dimensions:
    • 66 x 50 x 5.8mm
  • Operating Temperature:
    • -40°C to 85°C
  • Storage Temperature:
    • -40°C to 105°C
  • Humidity:
    • 10% to 80% (non-condensing)

Operating System Support:

The DSOM-042R supports a variety of operating systems, including:

  • Android 12.0
  • Ubuntu Desktop and Server
  • Debian 11
  • Buildroot RTLinux

Rockchip RK3588

The Dusun DSOM-042R can be evaluated using the DSGW-380 carrier board for gateways. However, this board is better suited for the Rockchip RK3588 version of the module and may not support advanced automotive features such as multiple cameras or extensive display interfaces. For specific automotive applications, custom carrier boards would need to be developed either by the customer or through Dusun’s ODM services.

Due to the custom nature of this design, pricing and availability details are not provided upfront. Interested customers are encouraged to contact Dusun directly to discuss their specific project requirements.

Monday, March 25, 2024

Using AI to expand global access to reliable flood forecasts

 Large-scale global flood forecasting has been out of reach for a long time. In our Nature paper published today we show how breakthroughs in AI can close the gap & provide reliable flood predictions even in regions that previously lacked data.

Floods are the most common natural disaster, and are responsible for roughly $50 billion in annual financial damages worldwide. The rate of flood-related disasters has more than doubled since the year 2000 partly due to climate change. Nearly 1.5 billion people, making up 19% of the world’s population, are exposed to substantial risks from severe flood events. Upgrading early warning systems to make accurate and timely information accessible to these populations can save thousands of lives per year.

Driven by the potential impact of reliable flood forecasting on people’s lives globally, we started our flood forecasting effort in 2017. Through this multi-year journey, we advanced research over the years hand-in-hand with building a real-time operational flood forecasting system that provides alerts on Google Search, Maps, Android notifications and through the Flood Hub. However, in order to scale globally, especially in places where accurate local data is not available, more research advances were required.

In “Global prediction of extreme floods in ungauged watersheds”, published in Nature, we demonstrate how machine learning (ML) technologies can significantly improve global-scale flood forecasting relative to the current state-of-the-art for countries where flood-related data is scarce. With these AI-based technologies we extended the reliability of currently-available global nowcasts, on average, from zero to five days, and improved forecasts across regions in Africa and Asia to be similar to what are currently available in Europe. The evaluation of the models was conducted in collaboration with the European Center for Medium Range Weather Forecasting (ECMWF).

These technologies also enable Flood Hub to provide real-time river forecasts up to seven days in advance, covering river reaches across over 80 countries. This information can be used by people, communities, governments and international organizations to take anticipatory action to help protect vulnerable populations.

Video preview image

Watch the film

Flood forecasting at Google

The ML models that power the FloodHub tool are the product of many years of research, conducted in collaboration with several partners, including academics, governments, international organizations, and NGOs.

In 2018, we launched a pilot early warning system in the Ganges-Brahmaputra river basin in India, with the hypothesis that ML could help address the challenging problem of reliable flood forecasting at scale. The pilot was further expanded the following year via the combination of an inundation model, real-time water level measurements, the creation of an elevation map and hydrologic modeling.

In collaboration with academics, and, in particular, with the JKU Institute for Machine Learning we explored ML-based hydrologic models, showing that LSTM-based models could produce more accurate simulations than traditional conceptual and physics-based hydrology models. This research led to flood forecasting improvements that enabled the expansion of our forecasting coverage to include all of India and Bangladesh. We also worked with researchers at Yale University to test technological interventions that increase the reach and impact of flood warnings.

Our hydrological models predict river floods by processing publicly available weather data like precipitation and physical watershed information. Such models must be calibrated to long data records from streamflow gauging stations in individual rivers. A low percentage of global river watersheds (basins) have streamflow gauges, which are expensive but necessary to supply relevant data, and it’s challenging for hydrological simulation and forecasting to provide predictions in basins that lack this infrastructure. Lower gross domestic product (GDP) is correlated with increased vulnerability to flood risks, and there is an inverse correlation between national GDP and the amount of publicly available data in a country. ML helps to address this problem by allowing a single model to be trained on all available river data and to be applied to ungauged basins where no data are available. In this way, models can be trained globally, and can make predictions for any river location.

flood-forecasting-5

There is an inverse (log-log) correlation between the amount of publicly available streamflow data in a country and national GDP. Streamflow data from the Global Runoff Data Center.

Our academic collaborations led to ML research that developed methods to estimate uncertainty in river forecasts and showed how ML river forecast models synthesize information from multiple data sources. They demonstrated that these models can simulate extreme events reliably, even when those events are not part of the training data. In an effort to contribute to open science, in 2023 we open-sourced a community-driven dataset for large-sample hydrology in Nature Scientific Data.

The river forecast model

Most hydrology models used by national and international agencies for flood forecasting and river modeling are state-space models, which depend only on daily inputs (e.g., precipitation, temperature, etc.) and the current state of the system (e.g., soil moisture, snowpack, etc.). LSTMs are a variant of state-space models and work by defining a neural network that represents a single time step, where input data (such as current weather conditions) are processed to produce updated state information and output values (streamflow) for that time step. LSTMs are applied sequentially to make time-series predictions, and in this sense, behave similarly to how scientists typically conceptualize hydrologic systems. Empirically, we have found that LSTMs perform well on the task of river forecasting.

flood-forecasting-4

A diagram of the LSTM, which is a neural network that operates sequentially in time. An accessible primer can be found here.

Our river forecast model uses two LSTMs applied sequentially: (1) a “hindcast” LSTM ingests historical weather data (dynamic hindcast features) up to the present time (or rather, the issue time of a forecast), and (2) a “forecast” LSTM ingests states from the hindcast LSTM along with forecasted weather data (dynamic forecast features) to make future predictions. One year of historical weather data are input into the hindcast LSTM, and seven days of forecasted weather data are input into the forecast LSTM. Static features include geographical and geophysical characteristics of watersheds that are input into both the hindcast and forecast LSTMs and allow the model to learn different hydrological behaviors and responses in various types of watersheds.

Output from the forecast LSTM is fed into a “head” layer that uses mixture density networks to produce a probabilistic forecast (i.e., predicted parameters of a probability distribution over streamflow). Specifically, the model predicts the parameters of a mixture of heavy-tailed probability density functions, called asymmetric Laplacian distributions, at each forecast time step. The result is a mixture density function, called a Countable Mixture of Asymmetric Laplacians (CMAL) distribution, which represents a probabilistic prediction of the volumetric flow rate in a particular river at a particular time.

flood-forecasting-1

LSTM-based river forecast model architecture. Two LSTMs are applied in sequence, one ingesting historical weather data and one ingesting forecasted weather data. The model outputs are the parameters of a probability distribution over streamflow at each forecasted timestep.

Input and training data

The model uses three types of publicly available data inputs, mostly from governmental sources:

  1. Static watershed attributes representing geographical and geophysical variables: From the HydroATLAS project, including data like long-term climate indexes (precipitation, temperature, snow fractions), land cover, and anthropogenic attributes (e.g., a nighttime lights index as a proxy for human development).
  2. Historical meteorological time-series data: Used to spin up the model for one year prior to the issue time of a forecast. The data comes from NASA IMERGNOAA CPC Global Unified Gauge-Based Analysis of Daily Precipitation, and the ECMWF ERA5-land reanalysis. Variables include daily total precipitation, air temperature, solar and thermal radiation, snowfall, and surface pressure.
  3. Forecasted meteorological time series over a seven-day forecast horizon: Used as input for the forecast LSTM. These data are the same meteorological variables listed above, and come from the ECMWF HRES atmospheric model.

Training data are daily streamflow values from the Global Runoff Data Center over the time period 1980 - 2023. A single streamflow forecast model is trained using data from 5,680 diverse watershed streamflow gauges (shown below) to improve accuracy.

flood-forecasting-3

Location of 5,680 streamflow gauges that supply training data for the river forecast model from the Global Runoff Data Center.

Improving on the current state-of-the-art

We compared our river forecast model with GloFAS version 4, the current state-of-the-art global flood forecasting system. These experiments showed that ML can provide accurate warnings earlier and over larger and more impactful events.

The figure below shows the distribution of F1 scores when predicting different severity events at river locations around the world, with plus or minus 1 day accuracy. F1 scores are an average of precision and recall and event severity is measured by return period. For example, a 2-year return period event is a volume of streamflow that is expected to be exceeded on average once every two years. Our model achieves reliability scores at up to 4-day or 5-day lead times that are similar to or better, on average, than the reliability of GloFAS nowcasts (0-day lead time).

flood-forecasting-2

Distributions of F1 scores over 2-year return period events in 2,092 watersheds globally during the time period 2014-2023 from GloFAS (blue) and our model (orange) at different lead times. On average, our model is statistically as accurate as GloFAS nowcasts (0–day lead time) up to 5 days in advance over 2-year (shown) and 1-year, 5-year, and 10-year events (not shown).

Thursday, July 27, 2023

AI helps household robots cut planning time in half

 PIGINet leverages machine learning to streamline and enhance household robots' task and motion planning, by assessing and filtering feasible solutions in complex environments.

Illustration with four panels shows 3D models of robots performing various tasks. From top left: a pan of potatos on top of stove, with a robot holding a food item. To the right is a robot standing in front of a cabinent. In the bottom left, a robot stands in front of a full kitchen and reaches for a pot. In the bottom right quadrant, a robotic gripper reaches into the sink for an item, next to two water bottles.

:
PIGINet predicts the feasibility of a task plan given images of objects, goal description, and initial state descriptions. It reduces the planning time of a task and motion planner by 50-80 percent by eliminating infeasible task plans.
Credits:
Images: Alex Shipps/CSAIL

Your brand new household robot is delivered to your house, and you ask it to make you a cup of coffee. Although it knows some basic skills from previous practice in simulated kitchens, there are way too many actions it could possibly take — turning on the faucet, flushing the toilet, emptying out the flour container, and so on. But there’s a tiny number of actions that could possibly be useful. How is the robot to figure out what steps are sensible in a new situation?

It could use PIGINet, a new system that aims to efficiently enhance the problem-solving capabilities of household robots. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are using machine learning to cut down on the typical iterative process of task planning that considers all possible actions. PIGINet eliminates task plans that can’t satisfy collision-free requirements, and reduces planning time by 50-80 percent when trained on only 300-500 problems. 

Typically, robots attempt various task plans and iteratively refine their moves until they find a feasible solution, which can be inefficient and time-consuming, especially when there are movable and articulated obstacles. Maybe after cooking, for example, you want to put all the sauces in the cabinet. That problem might take two to eight steps depending on what the world looks like at that moment. Does the robot need to open multiple cabinet doors, or are there any obstacles inside the cabinet that need to be relocated in order to make space? You don’t want your robot to be annoyingly slow — and it will be worse if it burns dinner while it’s thinking.

Household robots are usually thought of as following predefined recipes for performing tasks, which isn’t always suitable for diverse or changing environments. So, how does PIGINet avoid those predefined rules? PIGINet is a neural network that takes in “Plans, Images, Goal, and Initial facts,” then predicts the probability that a task plan can be refined to find feasible motion plans. In simple terms, it employs a transformer encoder, a versatile and state-of-the-art model designed to operate on data sequences. The input sequence, in this case, is information about which task plan it is considering, images of the environment, and symbolic encodings of the initial state and the desired goal. The encoder combines the task plans, image, and text to generate a prediction regarding the feasibility of the selected task plan. 

Keeping things in the kitchen, the team created hundreds of simulated environments, each with different layouts and specific tasks that require objects to be rearranged among counters, fridges, cabinets, sinks, and cooking pots. By measuring the time taken to solve problems, they compared PIGINet against prior approaches. One correct task plan may include opening the left fridge door, removing a pot lid, moving the cabbage from pot to fridge, moving a potato to the fridge, picking up the bottle from the sink, placing the bottle in the sink, picking up the tomato, or placing the tomato. PIGINet significantly reduced planning time by 80 percent in simpler scenarios and 20-50 percent in more complex scenarios that have longer plan sequences and less training data.

“Systems such as PIGINet, which use the power of data-driven methods to handle familiar cases efficiently, but can still fall back on “first-principles” planning methods to verify learning-based suggestions and solve novel problems, offer the best of both worlds, providing reliable and efficient general-purpose solutions to a wide variety of problems,” says MIT Professor and CSAIL Principal Investigator Leslie Pack Kaelbling.

PIGINet's use of multimodal embeddings in the input sequence allowed for better representation and understanding of complex geometric relationships. Using image data helped the model to grasp spatial arrangements and object configurations without knowing the object 3D meshes for precise collision checking, enabling fast decision-making in different environments. 

One of the major challenges faced during the development of PIGINet was the scarcity of good training data, as all feasible and infeasible plans need to be generated by traditional planners, which is slow in the first place. However, by using pretrained vision language models and data augmentation tricks, the team was able to address this challenge, showing impressive plan time reduction not only on problems with seen objects, but also zero-shot generalization to previously unseen objects.

“Because everyone’s home is different, robots should be adaptable problem-solvers instead of just recipe followers. Our key idea is to let a general-purpose task planner generate candidate task plans and use a deep learning model to select the promising ones. The result is a more efficient, adaptable, and practical household robot, one that can nimbly navigate even complex and dynamic environments. Moreover, the practical applications of PIGINet are not confined to households,” says Zhutian Yang, MIT CSAIL PhD student and lead author on the work. “Our future aim is to further refine PIGINet to suggest alternate task plans after identifying infeasible actions, which will further speed up the generation of feasible task plans without the need of big datasets for training a general-purpose planner from scratch. We believe that this could revolutionize the way robots are trained during development and then applied to everyone’s homes.” 

“This paper addresses the fundamental challenge in implementing a general-purpose robot: how to learn from past experience to speed up the decision-making process in unstructured environments filled with a large number of articulated and movable obstacles,” says Beomjoon Kim PhD ’20, assistant professor in the Graduate School of AI at Korea Advanced Institute of Science and Technology (KAIST). “The core bottleneck in such problems is how to determine a high-level task plan such that there exists a low-level motion plan that realizes the high-level plan. Typically, you have to oscillate between motion and task planning, which causes significant computational inefficiency. Zhutian's work tackles this by using learning to eliminate infeasible task plans, and is a step in a promising direction.”

Yang wrote the paper with NVIDIA research scientist Caelan Garrett SB ’15, MEng ’15, PhD ’21; MIT Department of Electrical Engineering and Computer Science professors and CSAIL members Tomás Lozano-Pérez and Leslie Kaelbling; and Senior Director of Robotics Research at NVIDIA and University of Washington Professor Dieter Fox. The team was supported by AI Singapore and grants from National Science Foundation, the Air Force Office of Scientific Research, and the Army Research Office. This project was partially conducted while Yang was an intern at NVIDIA Research. Their research will be presented in July at the conference Robotics: Science and Systems.

Thursday, June 30, 2022

Artificial neural networks model face processing in autism

 A new computational model could explain differences in recognizing facial emotions.

Photo of a woman in front of a white background with a cropped square outline focused on her face and small triangles marking the contours of her skin, indicating she is being scanned by a camera
Caption:
Autistic people often have a more difficult time recognizing emotions on others’ faces. New research sheds light on the inner workings of the brain to suggest an answer.

Many of us easily recognize emotions expressed in others’ faces. A smile may mean happiness, while a frown may indicate anger. Autistic people often have a more difficult time with this task. It’s unclear why. But new research, published June 15 in The Journal of Neuroscience, sheds light on the inner workings of the brain to suggest an answer. And it does so using a tool that opens new pathways to modeling the computation in our heads: artificial intelligence.

Researchers have primarily suggested two brain areas where the differences might lie. A region on the side of the primate (including human) brain called the inferior temporal (IT) cortex contributes to facial recognition. Meanwhile, a deeper region called the amygdala receives input from the IT cortex and other sources and helps process emotions.

Kohitij Kar, a research scientist in the lab of MIT Professor James DiCarlo, hoped to zero in on the answer. (DiCarlo, the Peter de Florez Professor in the Department of Brain and Cognitive Sciences, is a member of the McGovern Institute for Brain Research and director of MIT's Quest for Intelligence.)

Kar began by looking at data provided by two other researchers: Shuo Wang at Washington University in St. Louis and Ralph Adolphs at Caltech. In one experiment, they showed images of faces to autistic adults and to neurotypical controls. The images had been generated by software to vary on a spectrum from fearful to happy, and the participants judged, quickly, whether the faces depicted happiness. Compared with controls, autistic adults required higher levels of happiness in the faces to report them as happy.

Modeling the brain

Kar, who is also a member of the Center for Brains, Minds and Machines, trained an artificial neural network, a complex mathematical function inspired by the brain’s architecture, to perform the same task. The network contained layers of units that roughly resemble biological neurons that process visual information. These layers process information as it passes from an input image to a final judgment indicating the probability that the face is happy. Kar found that the network’s behavior more closely matched the neurotypical controls than it did the autistic adults.

The network also served two more interesting functions. First, Kar could dissect it. He stripped off layers and retested its performance, measuring the difference between how well it matched controls and how well it matched autistic adults. This difference was greatest when the output was based on the last network layer. Previous work has shown that this layer in some ways mimics the IT cortex, which sits near the end of the primate brain’s ventral visual processing pipeline. Kar’s results implicate the IT cortex in differentiating neurotypical controls from autistic adults.

The other function is that the network can be used to select images that might be more efficient in autism diagnoses. If the difference between how closely the network matches neurotypical controls versus autistic adults is greater when judging one set of images versus another set of images, the first set could be used in the clinic to detect autistic behavioral traits. “These are promising results,” Kar says. Better models of the brain will come along, “but oftentimes in the clinic, we don’t need to wait for the absolute best product.”

Next, Kar evaluated the role of the amygdala. Again, he used data from Wang and colleagues. They had used electrodes to record the activity of neurons in the amygdala of people undergoing surgery for epilepsy as they performed the face task. The team found that they could predict a person’s judgment based on these neurons’ activity. Kar reanalyzed the data, this time controlling for the ability of the IT-cortex-like network layer to predict whether a face truly was happy. Now, the amygdala provided very little information of its own. Kar concludes that the IT cortex is the driving force behind the amygdala’s role in judging facial emotion.

Noisy networks

Finally, Kar trained separate neural networks to match the judgments of neurotypical controls and autistic adults. He looked at the strengths or “weights” of the connections between the final layers and the decision nodes. The weights in the network matching autistic adults, both the positive or “excitatory” and negative or “inhibitory” weights, were weaker than in the network matching neurotypical controls. This suggests that sensory neural connections in autistic adults might be noisy or inefficient.

To further test the noise hypothesis, which is popular in the field, Kar added various levels of fluctuation to the activity of the final layer in the network modeling autistic adults. Within a certain range, added noise greatly increased the similarity between its performance and that of the autistic adults. Adding noise to the control network did much less to improve its similarity to the control participants. This further suggest that sensory perception in autistic people may be the result of a so-called “noisy” brain.

Computational power

Looking forward, Kar sees several uses for computational models of visual processing. They can be further prodded, providing hypotheses that researchers might test in animal models. “I think facial emotion recognition is just the tip of the iceberg,” Kar says. They can also be used to select or even generate diagnostic content. Artificial intelligence could be used to generate content like movies and educational materials that optimally engages autistic children and adults. One might even tweak facial and other relevant pixels in what autistic people see in augmented reality goggles, work that Kar plans to pursue in the future.

Ultimately, Kar says, the work helps to validate the usefulness of computational models, especially image-processing neural networks. They formalize hypotheses and make them testable. Does one model or another better match behavioral data? “Even if these models are very far off from brains, they are falsifiable, rather than people just making up stories,” he says. “To me, that’s a more powerful version of science.”

Dusun DSOM-042R: A Rockchip RK3588M System-on-Module for Automotive AIoT Applications

  Dusun has introduced the DSOM-042R, a robust system-on-module (SoM) based on the Rockchip RK3588M, an automotive-grade AI SoC. Designed to...