Double-descent curves in neural networks: a new perspective using Gaussian processes
Proceedings of the AAAI Conference on Artificial Intelligence Association for the Advancement of Artificial Intelligence 38:10 (2024) 11856-11864
Abstract:
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expressions allow us to explore the generalisation behavior of the corresponding kernel and GP regression. Furthermore, they offer a new interpretation of double-descent in terms of the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.Coarse-grained modeling of DNA–RNA hybrids
Journal of Chemical Physics American Institute of Physics 160:11 (2024) 115101
Abstract:
We introduce oxNA, a new model for the simulation of DNA–RNA hybrids that is based on two previously developed coarse-grained models—oxDNA and oxRNA. The model naturally reproduces the physical properties of hybrid duplexes, including their structure, persistence length, and force-extension characteristics. By parameterizing the DNA–RNA hydrogen bonding interaction, we fit the model’s thermodynamic properties to experimental data using both average-sequence and sequence-dependent parameters. To demonstrate the model’s applicability, we provide three examples of its use—calculating the free energy profiles of hybrid strand displacement reactions, studying the resolution of a short R-loop, and simulating RNA-scaffolded wireframe origami.An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem
Advances in Neural Information Processing Systems 37 (2024)
Abstract:
Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.Controlling DNA-RNA strand displacement kinetics with base distribution
(2024)