Large-scale global flood forecasting has been out of reach for a long time. In our Nature paper published today we show how breakthroughs in AI can close the gap & provide reliable flood predictions even in regions that previously lacked data.
Floods are the
, and are responsible for roughly in annual financial damages worldwide. The since the year 2000 partly . Nearly , making up 19% of the world’s population, are exposed to substantial risks from severe flood events. Upgrading early warning systems to make accurate and timely information accessible to these populations .Driven by the potential impact of reliable flood forecasting on people’s lives globally, we started our flood forecasting effort in 2017. Through this
, we advanced research over the years hand-in-hand with building a real-time operational flood forecasting system that on Google Search, Maps, Android notifications and through the . However, in order to , especially in places where accurate local data is not available, more research advances were required.In “
”, published in , we demonstrate how machine learning (ML) technologies can significantly improve global-scale relative to the current state-of-the-art for countries where flood-related data is scarce. With these AI-based technologies we extended the reliability of currently-available global nowcasts, on average, from zero to five days, and improved forecasts across regions in Africa and Asia to be similar to what are currently available in Europe. The evaluation of the models was conducted in collaboration with the European Center for Medium Range Weather Forecasting ( ).These technologies also enable
to provide real-time river forecasts up to seven days in advance, river reaches across over 80 countries. This information can be used by people, communities, governments and international organizations to take anticipatory action to help protect vulnerable populations.Flood forecasting at Google
The ML models that power the FloodHub tool are the product of many years of research, conducted in collaboration with several partners, including academics, governments, international organizations, and NGOs.
In 2018, we
early warning system in the Ganges-Brahmaputra river basin in India, with the that ML could help address the challenging problem of reliable flood forecasting at scale. The pilot was further the following year of an inundation model, real-time water level measurements, the creation of an elevation map and hydrologic modeling.In
with academics, and, in particular, with the we explored ML-based hydrologic models, showing that -based models could than traditional conceptual and physics-based . This research led to that enabled the of our forecasting coverage to include all of India and Bangladesh. We also worked with researchers at Yale University to test technological interventions that increase the of flood warnings.Our hydrological models predict river floods by processing publicly available weather data like precipitation and physical watershed information. Such models must be calibrated to long data records from
in individual rivers. A low percentage of global river watersheds (basins) have streamflow gauges, which are expensive but necessary to supply relevant data, and it’s challenging for hydrological simulation and forecasting to provide that lack this infrastructure. Lower (GDP) is correlated with increased , and there is an inverse correlation between national GDP and the amount of publicly available data in a country. ML helps to address this problem by allowing a and to be applied to ungauged basins where . In this way, models can be trained globally, and can make predictions for any river location.Our academic collaborations led to ML research that developed methods to
and showed how ML river forecast models . They demonstrated that these models can , even when those events are not part of the training data. In an effort to to open science, in 2023 we open-sourced a community-driven dataset for large-sample hydrology in .The river forecast model
Most hydrology models used by national and international agencies for flood forecasting and river modeling are state-space models, which depend only on daily inputs (e.g., precipitation, temperature, etc.) and the current state of the system (e.g., soil moisture, snowpack, etc.). LSTMs are a variant of state-space models and work by defining a neural network that represents a single time step, where input data (such as current weather conditions) are processed to produce updated state information and output values (streamflow) for that time step. LSTMs are applied sequentially to make time-series predictions, and in this sense, behave similarly to how scientists typically conceptualize hydrologic systems. Empirically, we have found that
on the task of river forecasting.Our river forecast model uses two LSTMs applied sequentially: (1) a “hindcast” LSTM ingests historical weather data (dynamic hindcast features) up to the present time (or rather, the issue time of a forecast), and (2) a “forecast” LSTM ingests states from the hindcast LSTM along with forecasted weather data (dynamic forecast features) to make future predictions. One year of historical weather data are input into the hindcast LSTM, and seven days of forecasted weather data are input into the forecast LSTM. Static features include geographical and geophysical characteristics of watersheds that are input into both the hindcast and forecast LSTMs and allow the model to learn different hydrological behaviors and responses in various types of watersheds.
Output from the forecast LSTM is fed into a “head” layer that uses
to produce a probabilistic forecast (i.e., predicted parameters of a probability distribution over streamflow). Specifically, the model predicts the parameters of a mixture of heavy-tailed probability density functions, called , at each forecast time step. The result is a mixture density function, called a (CMAL) distribution, which represents a probabilistic prediction of the volumetric flow rate in a particular river at a particular time.Input and training data
The model uses three types of publicly available data inputs, mostly from governmental sources:
- Static watershed attributes representing geographical and geophysical variables: From the , including data like long-term climate indexes (precipitation, temperature, snow fractions), land cover, and anthropogenic attributes (e.g., a nighttime lights index as a proxy for human development).
- Historical meteorological time-series data: Used to spin up the model for one year prior to the issue time of a forecast. The data comes from , , and the . Variables include daily total precipitation, air temperature, solar and thermal radiation, snowfall, and surface pressure.
- Forecasted meteorological time series over a seven-day forecast horizon: Used as input for the forecast LSTM. These data are the same meteorological variables listed above, and come from the .
Training data are daily streamflow values from the
over the time period 1980 - 2023. A single streamflow forecast model is trained using data from 5,680 diverse watershed streamflow gauges (shown below) to improve .Improving on the current state-of-the-art
We compared our river forecast model with
, the current state-of-the-art global flood forecasting system. These experiments showed that ML can provide accurate warnings earlier and over larger and more impactful events.The figure below shows the distribution of
when predicting different severity events at river locations around the world, with plus or minus 1 day accuracy. F1 scores are an average of precision and recall and event severity is measured by . For example, a 2-year return period event is a volume of streamflow that is expected to be exceeded on average once every two years. Our model achieves reliability scores at up to 4-day or 5-day lead times that are similar to or better, on average, than the reliability of GloFAS nowcasts (0-day lead time).