Interpretable feature incorporation machine-learning framework for flood magnitude estimation
Hydrology and Earth System Sciences Copernicus Publications 30:7 (2026) 2135-2160
Abstract:
Abstract. Fluvial floods pose severe socioeconomic and environmental risks and are projected to change in frequency and severity in future decades. Estimating the magnitude of extreme floods remains challenging, particularly for sparse tail events. This motivates the need to identify predictors across catchments and time. Synoptic-scale weather patterns (WPs) are often more temporally persistent and predictable than local meteorological variables, such as precipitation. However, the value of weather patterns as predictors for flood magnitude estimation is not well established. This study introduces a feature incorporation machine learning framework to quantify the relative contribution of synoptic, meteorological, and catchment controls on winter peak-over-threshold (POT) flood magnitudes (≥99th percentile) in near-natural catchments across the United Kingdom (UK) benchmark network. We train Random Forest regression models for a pooled national sample and for multiple hydro-climatic regional samples. Model interpretability was examined using Shapley Additive Explanations (SHAP). Additionally, we analyze the conditional probabilities of the WPs co-occurring with flood magnitudes. Our results show that WPs associated with cyclonic low-pressure systems frequently coincide with flood magnitudes but add minimal value to their estimation. Model skill is dominated by static catchment attributes such as aridity and event-day precipitation in the UK model, with regional model variability in feature importance reflecting hydro-climatic contrasts. Our findings highlight the variability in model outcomes depending on the model structure and the choice of features. This study also offers methodological guidance for developing large-sample machine learning models for flood estimation that integrate atmospheric predictors with traditional hydro-meteorological and geographical variables across a feature incorporation framework.Seasonal forecasting using the GenCast probabilistic machine learning model
Climate Dynamics Springer Nature 64:4 (2026) 148
Abstract:
Machine-learnt weather prediction (MLWP) models are now well established as being competitive with conventional numerical weather prediction (NWP) models in the medium range. However, there is still much uncertainty as to how this performance extends to longer timescales, where interactions with slower components of the earth system become important. We take GenCast, a state-of-the-art probabilistic MLWP model, and apply it to the task of seasonal forecasting with prescribed sea surface temperature (SST), by providing anomalies persisted over climatology (GenCast-Persisted) or forcing with observed SSTs (GenCastForced). The forecasts are compared to the European Centre for Medium-Range Weather Forecasts seasonal forecasting system, SEAS5. Our results indicate that, despite being trained at short timescales, GenCast-Persisted produces much of the correct precipitation patterns in response to El Ni˜no and La Ni˜na events, with several erroneous patterns in GenCast-Persisted corrected with GenCast-Forced. The uncertainty in precipitation response, as represented by the ensemble, compares favourably to SEAS5. Whilst SEAS5 achieves superior skill in the tropics for 2-metre temperature and mean sea level pressure (MSLP), GenCast-Persisted achieves higher skill in some areas in higher latitudes, including mountainous areas, with notable improvements for MSLP in particular; this is reflected in a slightly higher correlation with the observed NAO index. Reliability diagrams indicate that GenCast-Persisted has little skill relative to climatology, whilst GenCast-Forced produces forecasts with reliability comparable to SEAS5. These results provide an indication of the potential of MLWP models similar to GenCast for the ‘full’ seasonal forecasting problem, where the atmospheric model is coupled to ocean, land and cryosphere models.Beyond In-Distribution Skill: Towards Robust ML Parameterisations for Non-Stationary Climate Systems
Copernicus Publications (2026)
Abstract:
Data driven parameterisations for sub-grid processes unlocks the ability to surpass the current computational constraints of Earth system models. However, machine learning (ML) can be brittle. State-of-the-art ML approaches can reliably perform on in-distribution data, exceeding human ability across a diverse range of tasks. Yet, when faced with shifts in data distribution, performance degrades. In climate modelling, when the task is predicting the state of a non-stationary system, this is evidently a substantial issue. We illustrate this with the ClimSim dataset, forming spatio-temporal groups and quantitatively show how even small shifts in distribution affect performance.Next, we use the theory of compositional generalisation to build models which are less susceptible to these shifts in distribution. Compositional generalisation is the formation of novel combinations of observed elementary components. That is, the ability to decompose data into building blocks that are reused across both the in- and shifted-domains, such that a model can capture a domain shifted state through a set of in-domain, learnt abstractions. Inspired by these concepts we propose various architectural and regularisation changes to standard ML parameterisations to improve generalisation. Preliminary results in sub-grid process emulators suggest new insights into if and how CG can reduce model sensitivity to domain shifts.Evaluating emergent climate behaviour in a hybrid machine learned atmosphere -- dynamical ocean model
Copernicus Publications (2026)
Abstract:
Understanding how fast atmospheric variability shapes slow climate variability and sensitivity is a central challenge in Earth-system science. Recent advances in machine-learned (ML) atmospheric models have demonstrated remarkable skill on weather timescales, but their emergent behaviour in a fully coupled climate system is largely unexplored. We present results from a new hybrid modelling framework that couples a machine-learned atmosphere to a dynamical ocean model. We report on a set of 70-year coupled simulations (1950–2020 historical forcing and fixed-1950s control) in which the ACE2 ML climate emulator is interactively coupled to the NEMO ocean model. These experiments represent, to our knowledge, the first multi-decadal integrations of a machine-learned atmosphere interacting with a full-depth dynamical ocean. We assess the behaviour of the coupled system, with particular focus on low-frequency tropical variability and the climate response to greenhouse-gas forcing. Preliminary results indicate realistic emergent El Nino-like variability and a physically plausible climate sensitivity, suggesting that key atmosphere–ocean feedbacks can be captured within a hybrid ML–dynamical framework. These results evaluate the possible role of entirely machine-learned components in next-generation Earth-system models.Global climate signals of floods in near-natural rivers
Copernicus Publications (2026)