Introduction to the Special issue on symbolic regression in the physical sciences
Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences The Royal Society 384:2317 (2026) 20240600
Abstract:
Abstract Symbolic regression (SR) has emerged as a powerful method for uncovering interpretable mathematical relationships from data, offering a novel route to both scientific discovery and efficient empirical modelling. This article introduces the Special issue on symbolic regression for the physical sciences, motivated by the Royal Society discussion meeting held in April 2025. The contributions collected here span applications from automated equation discovery and emergent-phenomena modelling to the construction of compact emulators for computationally expensive simulations. The introductory review outlines the conceptual foundations of SR, contrasts it with conventional regression approaches and surveys its main use cases in the physical sciences, including the derivation of effective theories, empirical functional forms and surrogate models. We summarize methodological considerations such as search-space design, operator selection, complexity control, feature selection and integration with modern AI approaches. We also highlight ongoing challenges, including scalability, robustness to noise, overfitting and computational complexity. Finally, we emphasize emerging directions, particularly the incorporation of symmetry constraints, asymptotic behaviour and other theoretical information. Taken together, the papers in this Special issue illustrate the accelerating progress of SR and its growing relevance across the physical sciences. This article is part of the discussion meeting issue ‘Symbolic regression in the physical sciences’.The Velocity Field Olympics: Assessing velocity field reconstructions with direct distance tracers
Monthly Notices of the Royal Astronomical Society Oxford University Press (OUP) (2025) staf1960
Abstract:
Abstract The peculiar velocity field of the local Universe provides direct insights into its matter distribution and the underlying theory of gravity, and is essential in cosmological analyses for modelling deviations from the Hubble flow. Numerous methods have been developed to reconstruct the density and velocity fields at z ≲ 0.05, typically constrained by redshift-space galaxy positions or by direct distance tracers such as the Tully–Fisher relation, the fundamental plane, or Type Ia supernovae. We introduce a validation framework to evaluate the accuracy of these reconstructions against catalogues of direct distance tracers. Our framework assesses the goodness-of-fit of each reconstruction using Bayesian evidence, residual redshift discrepancies, velocity scaling, and the need for external bulk flows. Applying this framework to a suite of reconstructions—including those derived from the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm and from linear theory—we find that the non-linear BORG reconstruction consistently outperforms others. We highlight the utility of such a comparative approach for supernova or gravitational wave cosmological studies, where selecting an optimal peculiar velocity model is essential. Additionally, we present calibrated bulk flow curves predicted by the reconstructions and perform a density–velocity cross-correlation using a linear theory reconstruction to constrain the growth factor, yielding S8 = 0.793 ± 0.035. The result is in good agreement with both weak lensing and Planck, but is in strong disagreement with some peculiar velocity studies.Creating halos with autoregressive multistage networks
Physical Review D American Physical Society 112:10 (2025) 103503
Abstract:
To maximize the amount of information extracted from cosmological datasets, simulations that accurately represent these observations are necessary. However, traditional simulations that evolve particles under gravity by estimating particle-particle interactions (𝑁-body simulations) are computationally expensive and prohibitive to scale to the large volumes and resolutions necessary for the upcoming datasets. Moreover, modeling the distribution of galaxies typically involves identifying virialized dark matter halos, which is also a time- and memory-consuming process for large 𝑁-body simulations, further exacerbating the computational cost. In this study, we introduce CHARM, a novel method for creating mock halo catalogs by matching the spatial, mass, and velocity statistics of halos directly from the large-scale distribution of the dark matter density field. We develop multistage neural spline flow-based networks to learn this mapping at redshift 𝑧 =0.5 directly with computationally cheaper low-resolution particle mesh simulations instead of relying on the high-resolution 𝑁-body simulations. We show that the mock halo catalogs and painted galaxy catalogs have the same statistical properties as obtained from 𝑁-body simulations in both real space and redshift space. Finally, we use these mock catalogs for cosmological inference using redshift-space galaxy power spectrum, bispectrum, and wavelet-based statistics using simulation-based inference, performing the first inference with accelerated forward model simulations and finding unbiased cosmological constraints with well-calibrated posteriors.syren-baryon: Analytic emulators for the impact of baryons on the matter power spectrum
Astronomy & Astrophysics EDP Sciences 701 (2025) ARTN A284
Abstract:
Context. Baryonic physics has a considerable impact on the distribution of matter in our Universe on scales probed by current and future cosmological surveys, acting as a key systematic in such analyses. Aims. We seek simple symbolic parametrisations for the impact of baryonic physics on the matter power spectrum for a range of physically motivated models, as a function of wavenumber, redshift, cosmology, and parameters controlling the baryonic feedback. Methods. We used symbolic regression to construct analytic approximations for the ratio of the matter power spectrum in the presence of baryons to that without such effects. We obtained separate functions of each of four distinct sub-grid prescriptions of baryonic physics from the CAMELS suite of hydrodynamical simulations (Astrid, IllustrisTNG, SIMBA, and Swift-EAGLE) as well as for a baryonification algorithm. We also provide functions that describe the uncertainty on these predictions, due to both the stochastic nature of baryonic physics and the errors on our fits. Results. The error on our approximations to the hydrodynamical simulations is comparable to the sample variance estimated through varying initial conditions, and our baryonification expression has a root mean squared error of better than one percent, although this increases on small scales. These errors are comparable to those of previous numerical emulators for these models. Our expressions are enforced to have the physically correct behaviour on large scales and at high redshift. Due to their analytic form, we are able to directly interpret the impact of varying cosmology and feedback parameters, and we can identify parameters that have little to no effect. Conlcusions. Each function is based on a different implementation of baryonic physics, and can therefore be used to discriminate between these models when applied to real data. We provide a publicly available code for all symbolic approximations found.SYREN-NEW: Precise formulae for the linear and nonlinear matter power spectra with massive neutrinos and dynamical dark energy
Astronomy & Astrophysics EDP Sciences 698 (2025) ARTN A1