3D Cloud reconstruction through geospatially-aware Masked Autoencoders
Workshop paper at “Machine Learning and the Physical Sciences”, NeurIPS (2024)
Abstract:
Clouds play a key role in Earth's radiation balance with complex effects that introduce large uncertainties into climate models. Real-time 3D cloud data is essential for improving climate predictions. This study leverages geostationary imagery from MSG/SEVIRI and radar reflectivity measurements of cloud profiles from CloudSat/CPR to reconstruct 3D cloud structures. We first apply self-supervised learning (SSL) methods-Masked Autoencoders (MAE) and geospatially-aware SatMAE on unlabelled MSG images, and then fine-tune our models on matched image-profile pairs. Our approach outperforms state-of-the-art methods like U-Nets, and our geospatial encoding further improves prediction results, demonstrating the potential of SSL for cloud reconstruction.
Multifractal Analysis for Evaluating the Representation of Clouds in Global Kilometer-Scale Models
Geophysical Research Letters, 51 (2024)
Abstract:
Clouds are one of the largest sources of uncertainty in climate predictions. Global km-scale models need to simulate clouds and precipitation accurately to predict future climates. To isolate issues in their representation of clouds, models need to be thoroughly evaluated with observations. Here, we introduce multifractal analysis as a method for evaluating km-scale simulations. We apply it to outgoing longwave radiation fields to investigate structural differences between observed and simulated anvil clouds. We compute fractal parameters which compactly characterize the scaling behavior of clouds and can be compared across simulations and observations. We use this method to evaluate the nextGEMS ICON simulations via comparison with observations from the geostationary satellite GOES-16. We find that multifractal scaling exponents in the ICON model are significantly lower than in observations. We conclude that too much variability is contained in the small scales (<100 km) leading to less organized convection and smaller, isolated anvils.
A Machine Learning Approach for Predicting Essentiality of Metabolic Genes
In: Braman, J.C. (eds) Synthetic Biology. Methods in Molecular Biology, vol 2760 (2024)
Abstract:
The identification of essential genes is a key challenge in systems and synthetic biology, particularly for engineering metabolic pathways that convert feedstocks into valuable products. Assessment of gene essentiality at a genome scale requires large and costly growth assays of knockout strains. Here we describe a strategy to predict the essentiality of metabolic genes using binary classification algorithms. The approach combines elements from genome-scale metabolic models, directed graphs, and machine learning into a predictive model that can be trained on small knockout data. We demonstrate the efficacy of this approach using the most complete metabolic model of Escherichia coli and various machine learning algorithms for binary classification.
Prediction of gene essentiality using machine learning and genome-scale metabolic models
IFAC-PapersOnLine 55:23 (2022)
Abstract:
The identification of essential genes, i.e. those that impair cell survival when deleted, requires large growth assays of knock-out strains. The complexity and cost of such experiments has triggered a growing interest in computational methods for prediction of gene essentiality. In the case of metabolic genes, Flux Balance Analysis (FBA) is widely employed to predict essentiality under the assumption that cells maximize their growth rate. However, this approach assumes that knock-out strains optimize the same objectives as the wild-type, which excludes cases in which deletions cause large physiological changes to meet other objectives for survival. Here, we resolve this limitation with a novel machine learning approach that predicts essentiality directly from wild-type flux distributions. We first project the wild-type FBA solution onto a mass flow graph, a digraph with reactions as nodes and edge weights proportional to the mass transfer between reactions, and then train binary classifiers on the connectivity of graph nodes. We demonstrate the efficacy of this approach using the most complete metabolic model of Escherichia coli, achieving near state-of-the art prediction accuracy for essential genes. Our approach suggests that wild-type FBA solutions contain enough information to predict essentiality, without the need to assume optimality of deletion strains.