Forced Component Estimation Statistical Method Intercomparison Project (ForceSMIP)

(2025)

Authors:

Robert CJ Wills, Clara Deser, Karen A McKinnon, Adam Phillips, Stephen Po-Chedley, Sebastian Sippel, Anna L Merrifield, Constantin Bône, Céline Bonfils, Gustau Camps-Valls, Stephen Cropper, Charlotte Connolly, Shiheng Duan, Homer Durand, Alexander Feigin, MA Fernandez, Guillaume Gastineau, Andrei Gavrilov, Emily Gordon, Moritz Günther, Maren Höver, Sergey Kravtsov, Yan-Ning Kuo, Justin Lien, Gavin D Madakumbura, Nathan Mankovich, Matthew Newman, Jamin Rader, Jia-Rui Shi, Sang-Ik Shin, Gherardo Varando

Vertically Recurrent Neural Networks for Sub‐Grid Parameterization

Journal of Advances in Modeling Earth Systems Wiley 17:6 (2025) e2024MS004833

Authors:

P Ukkonen, M Chantry

Abstract:

Machine learning has the potential to improve the physical realism and/or computational efficiency of parameterizations. A typical approach has been to feed concatenated vertical profiles to a dense neural network. However, feed‐forward networks lack the connections to propagate information sequentially through the vertical column. Here we examine if predictions can be improved by instead traversing the column with recurrent neural networks (RNNs) such as Long Short‐Term Memory (LSTMs). This method encodes physical priors (locality) and uses parameters more efficiently. Firstly, we test RNN‐based radiation emulators in the Integrated Forecasting System. We achieve near‐perfect offline accuracy, and the forecast skill of a suite of global weather simulations using the emulator are for the most part statistically indistinguishable from reference runs. But can radiation emulators provide both high accuracy and a speed‐up? We find optimized, state‐of‐the‐art radiation code on CPU generally faster than RNN‐based emulators on GPU, although the latter can be more energy efficient. To test the method more broadly, and explore recent challenges in parameterization, we also adapt it to data sets from other studies. RNNs outperform reference feed‐forward networks in emulating gravity waves, and when combined with horizontal convolutions, for non‐local unified parameterization. In emulation of moist physics with memory, the RNNs have similar offline accuracy as ResNets, the previous state‐of‐the‐art. However, the RNNs are more efficient, and more stable in autoregressive semi‐prognostic tests. Multi‐step autoregressive training improves performance in these tests and enables a latent representation of convective memory. Recently proposed linearly recurrent models achieve similar performance to LSTMs.

The Link between Gulf Stream Precipitation Extremes and European Blocking in General Circulation Models and the Role of Horizontal Resolution

Journal of Climate (2025)

Authors:

Kristian Strommen, Simon LL Michel, Hannah M Christensen

Abstract:

Past studies show that coupled model biases in European blocking and North Atlantic eddy-driven jet variability decrease as one increases the horizontal resolution in the atmospheric and oceanic model components, but it remains unclear if atmospheric or oceanic resolution plays the greater role, and why. Here, following recent work by Schemm et al., we leverage a large multi-model ensemble to show that a coupled model’s ability to simulate extreme Gulf Stream precipitation is tightly linked to its simulated frequency of European blocking and northern jet excursions. Furthermore, the reduced biases in blocking and jet variability are consistent with better resolved precipitation extrema in high-resolution models. Analysis supports a hypothesis that models which simulate more extreme precipitation can generate more strongly poleward propagating cyclones and more intense anticyclonic anomalies due to the stronger latent heat release occurring during extreme events. By contrast, typical North Atlantic SST biases are found to share only a weak or negligible relationship with blocking and jet biases. Finally, while previous studies have used a comparison between coupled models and models run with prescribed SSTs to argue for the role of ocean resolution, we emphasise here that models run with prescribed SSTs experience greatly reduced precipitation extremes due to their excessive thermal damping, making it unclear if such a comparison is meaningful. Instead, we speculate that most of the reduction in coupled model biases may actually be due to increased atmospheric resolution leading to better resolved convection.

The Link between Gulf Stream Precipitation Extremes and European Blocking in General Circulation Models and the Role of Horizontal Resolution

Journal of Climate American Meteorological Society (2025)

Authors:

Kristian Strommen, Simon LL Michel, Hannah M Christensen

How to Derive Skill from the Fractions Skill Score

Monthly Weather Review American Meteorological Society 153:6 (2025) 1021-1033

Authors:

Bobby Antonio, Laurence Aitchison

Abstract:

<jats:title>Abstract</jats:title> <jats:p>The fractions skill score (FSS) is a widely used metric for assessing forecast skill, with applications ranging from precipitation to volcanic ash forecasts. By evaluating the fraction of grid squares exceeding a threshold in a neighborhood, the intuition is that it can avoid the pitfalls of pixelwise comparisons and identify length scales at which a forecast has skill. The FSS is typically interpreted relative to a “useful” criterion, where a forecast is considered skillful if its score exceeds a simple reference score. However, the typical reference score used is problematic, since it is not derived in a way that provides obvious meaning, does not scale with neighborhood size, and may not be exceeded by forecasts that have skill. We, therefore, provide a new method to determine forecast skill from the FSS, by deriving an expression for the FSS achieved by a random forecast, which provides a more robust and meaningful reference score to compare with. Through illustrative examples, we show that this new method considerably changes the length scales at which a forecast would be regarded as skillful and reveals subtleties in how the FSS should be interpreted.</jats:p> <jats:sec> <jats:title>Significance Statement</jats:title> <jats:p>Forecast verification metrics are crucial to assess accuracy and identify where forecasts can be improved. In this work, we investigate a popular verification metric, the fractions skill score, and derive a more robust method to decide if a forecast has sufficiently high skill. This new method significantly improves the quality of insights that can be drawn from this score.</jats:p></jats:sec>