What We Don't C: Representations for scientific discovery beyond VAEs

Machine Learning and the Physical Sciences workshop at NeurIPS 2025

Authors:

Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft

Abstract:

Accessing information in learned representations is critical for scientific discovery in high-dimensional domains. We introduce a novel method based on latent flow matching with classifier-free guidance that disentangles latent subspaces by explicitly separating information included in conditioning from information that remains in the residual representation. Across three experiments -- a synthetic 2D Gaussian toy problem, colored MNIST, and the Galaxy10 astronomy dataset -- we show that our method enables access to meaningful features of high dimensional data. Our results highlight a simple yet powerful mechanism for analyzing, controlling, and repurposing latent representations, providing a pathway toward using generative models for scientific exploration of what we don't capture, consider, or catalog.

HETDEX-LOFAR Spectroscopic Redshift Catalog ∗ ∗ Based on observations obtained with the Hobby–Eberly Telescope, which is a joint project of the University of Texas at Austin, the Pennsylvania State University, Ludwig-Maximilians-Universität München, and Georg-August-Universität Göttingen

The Astrophysical Journal American Astronomical Society 978:1 (2024) 101

Authors:

Maya H Debski, Gregory R Zeimann, Gary J Hill, Donald P Schneider, Leah Morabito, Gavin Dalton, Matt J Jarvis, Erin Mentuch Cooper, Robin Ciardullo, Eric Gawiser, Nika Jurlin

Abstract:

We combine the power of blind integral field spectroscopy from the Hobby–Eberly Telescope (HET) Dark Energy Experiment (HETDEX) with sources detected by the Low Frequency Array (LOFAR) to construct the HETDEX-LOFAR Spectroscopic Redshift Catalog. Starting from the first data release of the LOFAR Two-metre Sky Survey, including a value-added catalog with photometric redshifts, we extracted 28,705 HETDEX spectra. Using an automatic classifying algorithm, we assigned each object a star, galaxy, or quasar label along with a velocity/redshift, with supplemental classifications coming from the continuum and emission-line catalogs of the internal, fourth data release from HETDEX (HDR4). We measured 9087 new redshifts; in combination with the value-added catalog, our final spectroscopic redshift sample is 9710 sources. This new catalog contains the highest substantial fraction of LOFAR galaxies with spectroscopic redshift information; it improves archival spectroscopic redshifts and facilitates research to determine the [O ii] emission properties of radio galaxies from 0.0 < z < 0.5, and the Lyα emission characteristics of both radio galaxies and quasars from 1.9 < z < 3.5. Additionally, by combining the unique properties of LOFAR and HETDEX, we are able to measure star formation rates (SFRs) and stellar masses. Using the Visible Integral-field Replicable Unit Spectrograph, we measure the emission lines of [O iii], [Ne iii], and [O ii] and evaluate line-ratio diagnostics to determine whether the emission from these galaxies is dominated by active galactic nuclei or star formation and fit a new SFR–L 150MHz relationship.

No Evidence for a Significant Evolution of M • – M. Relation in Massive Galaxies up to z ∼ 4

The Astrophysical Journal American Astronomical Society 978:1 (2024) 98

Authors:

Yang Sun, Jianwei Lyu, George H Rieke, Zhiyuan Ji, Fengwu Sun, Yongda Zhu, Andrew J Bunker, Phillip A Cargile, Chiara Circosta, Francesco D’Eugenio, Eiichi Egami, Kevin Hainline, Jakob M Helton, Pierluigi Rinaldi, Brant E Robertson, Jan Scholtz, Irene Shivaei, Meredith A Stone, Sandro Tacchella, Christina C Williams, Christopher NA Willmer, Chris Willott

Abstract:

Over the past two decades, tight correlations between black hole masses (M•) and their host galaxy properties have been firmly established for massive galaxies (with stellar mass log(M*/M⊙)≳10 ) at low-z (z < 1), indicating coevolution of supermassive black holes and galaxies. However, the situation at high-z, especially beyond cosmic noon (z ≳ 2.5), is controversial. With a combination of JWST Near Infrared Camera (NIRCam)/wide field slitless spectroscopy (WFSS) from FRESCO, CONGRESS and deep multiband NIRCam/image data from JADES in the GOODS fields, we study the black-hole-to-galaxy mass relation at z ∼ 1–4. After identifying 18 broad-line active galactic nuclei (AGNs) at 1 < z < 4 (with 8 at z > 2.5) from the WFSS data, we measure their black hole masses based on broad near-infrared lines (Paα, Paβ, and He i λ10833 Å), and constrain their stellar masses from AGN-galaxy image decomposition or spectral energy distribution decomposition. Taking account of the observational biases, the intrinsic scatter of the M•−M* relation, and the errors in mass measurements, we find no significant difference in the M•/M* ratio for 2.5 < z < 4 compared to that at lower redshifts (1 < z < 2.5), suggesting no evolution of the M•−M* relation at log(M*/M⊙)≳10 up to z ∼ 4.

The Relation between AGN and Host-galaxy Properties in the JWST Era. I. Seyferts at Cosmic Noon are Obscured and Disturbed

The Astrophysical Journal American Astronomical Society 978:1 (2024) 74

Authors:

Nina Bonaventura, Jianwei Lyu, George H Rieke, Stacey Alberts, Christopher NA Willmer, Pablo G Pérez-González, Andrew J Bunker, Meredith Stone, Francesco D’Eugenio, Christina C Williams, Michael V Maseda, Chris J Willott, Zhiyuan Ji, William M Baker, Stefano Carniani, Stephane Charlot, Jacopo Chevallard, Emma Curtis-Lake, Daniel J Eisenstein, Kevin Hainline, Ryan Hausen, Erica J Nelson, Marcia J Rieke, Brant Robertson

Abstract:

The morphology of a galaxy reflects the mix of physical processes occurring within and around it, offering indirect clues to its formation and evolution. We apply both visual classification and computer vision to test the suspected connection between galaxy mergers and active galactic nucleus (AGN) activity, as evidenced by a close/merging galaxy pair, or tidal features surrounding an apparently singular system. We use JADES JWST/NIRCam imagery of a complete, multiwavelength AGN sample recently expanded with JWST/Mid-Infrared Instrument (MIRI) photometry. This 0.9–25 μm data set enables constraints on the host-galaxy morphologies of a broad range of AGN beyond z ∼ 1, including heavily obscured examples missing from previous studies. Our primary AGN sample consists of 243 lightly to highly obscured X-ray-selected AGN and 138 presumed Compton-thick, mid-infrared-bright/X-ray-faint AGN revealed by MIRI. Utilizing the shape asymmetry morphology indicator, AS, as the metric for disturbance, we find that 88% of the Seyferts sampled are strongly spatially disturbed (AS > 0.2). The experimental design we employ reveals a ≳3σ obscuration–merger (NH–AS) correlation at 0.6 < z < 2.4, and also recovers a physical distinction between the X-ray- and mid-IR-detected AGN suggestive of their link to a common evolutionary scenario. Placing the observed pattern of disturbances in the context of the other average host-galaxy properties, we conclude that mergers are common among obscured AGN. This finding presents tension with the leading model on AGN fueling that requires Seyfert AGN with subquasar luminosities (Lbol < 1045 erg s−1) to evolve only through nonmerger mechanisms.

Radio galaxies in simba: a MIGHTEE comparison

Monthly Notices of the Royal Astronomical Society Oxford University Press (OUP) 536:3 (2024) 2873-2890

Authors:

Nicole L Thomas, Imogen H Whittam, Catherine L Hale, Leah K Morabito, Romeel Davé, Matt J Jarvis, Robin HW Cook