Separating Epistemic and Aleatoric Uncertainties in Weather and Climate Models

(2026)

Authors:

Laura Mansfield, Hannah Christensen

Abstract:

Representing and quantifying uncertainty in physical parameterisations is a central challenge in weather and climate modelling, and approaches are often developed separately for different timescales. Here, we consider the separation of uncertainty by source using machine learning frameworks for subgrid-scale parameterisations. In this context, aleatoric uncertainty arises from internal variability in the training data, and epistemic uncertainty, arises from poorly constrained parameters during training. Using the Lorenz 1996 system as a testbed for simplified chaotic dynamics, we deal with uncertainties through a unified framework using Bayesian Neural Networks, to explore how the different sources of uncertainty evolve over different prediction timescales.

How different are parameterisation packages really and how can we interpret stochastic perturbations?

(2026)

Authors:

Edward Groot, Hannah Christensen, Xia Sun, Kathryn Newman, Wahiba Lfarh, Romain Roehrig, Lisa Bengtsson, Julia Simonson

Abstract:

In the Model Uncertainty-Model Intercomparison Project (MUMIP) we compare parameterisation packages from different modelling centres using their single-column modelling (SCM) frameworks. We will showcase the dataset from an Indian Ocean experiment at a 0.2 degrees grid covering one month, with about 10 million simulations of each model. These parametrised models are compared against a convection-permitting benchmark from DYAMOND under common dynamical constraints. We will show differences and similarities in precipitation patterns and physics tendencies among four models and show how these differences can be generalised. Following earlier works, we find that at coarse grids that do not resolve convection, parameterisation packages tend to produce overconfident tendencies compared to the convection-permitting benchmark. Furthermore, we test several hypotheses on the MUMIP dataset to explain the differences. We use the data to explore the foundations of stochastic physical parametrisations. Would stochastic physics effectively overcome the overconfidence for good reasons? May the stochastic perturbations actually have a physically meaningful quantitative interpretation? Can stochastic physics be used to partially overcome truncation and grid spacing limitations?

New insights into decadal climate variability in the North Atlantic revealed by data-driven dynamical models

(2026)

Authors:

Andrew Nicoll, Hannah Christensen, Chris Huntingford, Doug Smith

Abstract:

The Atlantic Multidecadal Variability (AMV) and the North Atlantic Oscillation (NAO) are the dominant modes of oceanic and atmospheric variability in the North Atlantic, respectively, and are key sources of predictability from seasonal to decadal timescales. However, the physical processes and feedback mechanisms linking the AMV and NAO, and the role of diabatic processes in these feedbacks, remain debated. We present a data-driven dynamical modelling framework which captures coupled decadal variability in AMV, NAO, and North Atlantic precipitation. Applying equation discovery methods to observational data, we identify deterministic low-order dynamical models consisting of three coupled ordinary differential equations. These models reproduce observed North Atlantic decadal variability and show robust out-of-sample predictive skill on multi-annual to decadal lead times. The resulting model dynamics include a distinct quasi-periodic 20-year oscillation consistent with a damped oceanic mode of variability. Notably, precipitation-related terms feature prominently in the low-order models, suggesting an important role for latent heat release and freshwater fluxes in mediating ocean–atmosphere interactions. We propose new feedback mechanisms between North Atlantic sea surface temperature and the NAO, with precipitation acting as a dynamical bridge. By linearising the low-order models and computing finite-time Lyapunov exponents, we find that North Atlantic precipitation is more predictable in a positive AMV phase. We then analyse several decadal prediction ensemble experiments based on initialised hindcasts and find comparable state-dependent predictability of precipitation. Overall, these results illustrate how data-driven equation discovery can provide mechanistic hypotheses and new insight beyond conventional analyses of observations and climate model simulations.

Short- to long-range climate forecasts with deep learning

(2026)

Authors:

Simon Michel, Kristian Strommen, Hannah Christensen

Abstract:

Uncertainty in projections of future regional climate change remains large, driven by structural differences among Earth System Models and the influence of internal climate variability. Existing uncertainty-reduction approaches, including emergent constraints and Bayesian variants, primarily focus on forced climate responses derived from simple aggregate metrics, thereby requiring strong assumptions and exploiting only low-dimensional climate information. Here we propose a data-driven deep-learning framework that directly forecasts spatially and monthly resolved decadal mean climatologies of surface temperature anomalies from the 2030s to the 2090s, using only recent monthly trajectories spanning 1980-2025. The training ensemble contains 265 historical+SSP2-4.5 simulations, distributed across 40 ESMs from 25 different families (i.e., modelling centers) over which the cross validation is performed. The architecture couples pluri-annual to multi-decadal temporal convolutions with a spatial U-Net encoder-decoder and is evaluated on CMIP6 simulations using a leave-one-model-family-out cross-validation (LOMFO-CV) design to ensure generalisation across separately developed ESMs. Predictive uncertainty is quantified via LOMFO-CV errors, yielding conservative and reliable ranges that incorporate irreducible internal variability and systematic model shifts.To further evaluate the predictive capacity beyond the CMIP6 distribution, we evaluated the network on historical+SSP2-4.5 simulations from a recent HadGEM3-GC5 model hierarchy developed within the European Eddy-Rich ESMs (EERIE) project, the European contribution to HighResMIP2 for CMIP7. In particular, the eddy-rich GC5-HH configuration explicitly simulates mesoscale ocean dynamics that are absent in CMIP6-type models, providing a rigorous test of generalisation to richer and more realistic physical representations. Despite these substantial differences, the network successfully reproduces warming trajectories and future climate patterns for all three model configurations (GC5-LL, GC5-MM, GC5-HH), with forecast errors largely contained within empirically calibrated uncertainty bounds from the LOMFO-CV, both globally and locally. These results, notably for GC5-HH and its more realistic physics, strengthens confidence in the applicability of the framework to real-world data.When applied to observations, the extracted end-of-century global-mean surface temperature and its uncertainty range are consistent with prior estimates from Bayesian frameworks. At local scales, the network reduces uncertainty by 40% (2030s) to 30% (2090s) on average, and by up to 75% in some regions for all future decades. Importantly, these uncertainty estimates account not only for uncertainty in the forced response (as emergent constraint methods do), but also for errors associated with predicting different realisations of internal variability, providing a physically meaningful reduction of local and global climate uncertainty. 

Spatial Generalization Tests for Machine Learning-based Weather Models as a Requirement for Climate Predictions

(2026)

Authors:

Maren Höver, Milan Klöwer, Christian Schroeder de Witt, Hannah M Christensen

Abstract:

Machine learning-based weather prediction is revolutionizing weather forecasting by learning from present-day climate. However, generalization to other climates remains a major challenge. With melting sea ice, land-use change and increasing ocean temperatures, boundary conditions are changing. Therefore, generalization in time will likely only be possible if generalization in space is also given. The physics of the atmosphere is invariant in space, and as such, a model should demonstrate the same to accurately represent the real world.Here, we present three test cases to evaluate whether machine learning-based weather and climate models generalize spatially and apply them to multiple AI weather models. The tests consist of reversing the entirety of the input data and boundary conditions in latitude (Test 1), reversing them in longitude (Test 2), as well as rotating them by 180˚ in longitude (Test 3), while keeping all aspects of the simulation physically consistent. For a deterministic model that generalizes in space, each of these test cases yields the same predictions as the baseline case, only subject to a rounding error. With these test cases, we investigate whether data-driven models hardcode representations of spatial relationships in the training data into their latent space. We show that currently, both fully data-driven and hybrid general circulation models do not pass these tests, instead performing poorly with unphysical results. This implies that they have likely not learned underlying atmospheric physics principles, but instead local spatial relationships statistically dependent on geographical location. This calls into question the ability of such models to simulate a changing regional climate. As such, we propose that machine learning-based climate models be evaluated using our spatial tests during model development to reduce overfitting on present-day regional climate.