Ard Louis: Publications

Deep neural networks have an inbuilt Occam’s razor

Nature Communications Nature Research 16:1 (2025) 220

Authors:

Chris Mingard, Henry Rees, Guillermo Valle-Pérez, Ard A Louis

Abstract:

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components for supervised learning, we apply a Bayesian picture based on the functions expressed by a DNN. The prior over functions is determined by the network architecture, which we vary by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. Combining this with the prior yields an accurate prediction for the posterior, measured for DNNs trained with stochastic gradient descent. This analysis shows that structured data, together with a specific Occam’s razor-like inductive bias towards (Kolmogorov) simple functions that exactly counteracts the exponential growth of the number of functions with complexity, is a key to the success of DNNs.

More details from the publisher

Details from ORA

More details

Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map

(2024)

Authors:

Yoonsoo Nam, Chris Mingard, Seok Hyeong Lee, Soufiane Hayou, Ard Louis

More details from the publisher

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Curran Associates 37 (2024)

Authors:

Yoonsoo Nam, Nayara Fonseca, Sh Lee, Christopher Mingard, Ard A Louis

Abstract:

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.

Details from ORA

Exploiting the equivalence between quantum neural networks and perceptrons

(2024)

Authors:

Chris Mingard, Jessica Pointing, Charles London, Yoonsoo Nam, Ard A Louis

More details from the publisher

Exploring Simplicity Bias in 1D Dynamical Systems

Entropy MDPI 26:5 (2024) 426

Authors:

Kamal Dingle, Mohammad Alaskandarani, Boumediene Hamzi, Ard A Louis

Abstract:

Arguments inspired by algorithmic information theory predict an inverse relation between the probability and complexity of output patterns in a wide range of input-output maps. This phenomenon is known as simplicity bias. By viewing the parameters of dynamical systems as inputs, and the resulting (digitised) trajectories as outputs, we study simplicity bias in the logistic map, Gauss map, sine map, Bernoulli map, and tent map. We find that the logistic map, Gauss map, and sine map all exhibit simplicity bias upon sampling of map initial values and parameter values, but the Bernoulli map and tent map do not. The simplicity bias upper bound on the output pattern probability is used to make a priori predictions regarding the probability of output patterns. In some cases, the predictions are surprisingly accurate, given that almost no details of the underlying dynamical systems are assumed. More generally, we argue that studying probability-complexity relationships may be a useful tool when studying patterns in dynamical systems.

More details from the publisher

Details from ORA

More details

Ard Louis

Research theme

Sub department

Deep neural networks have an inbuilt Occam’s razor

Authors:

Abstract:

Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map

Authors:

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

Authors:

Abstract:

Exploiting the equivalence between quantum neural networks and perceptrons

Authors:

Exploring Simplicity Bias in 1D Dynamical Systems

Authors:

Abstract:

FIND US

CONTACT US

Ard Louis

Research theme

Sub department

Research groups

Deep neural networks have an inbuilt Occam’s razor

Authors:

Abstract:

Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map

Authors:

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

Authors:

Abstract:

Exploiting the equivalence between quantum neural networks and perceptrons

Authors:

Exploring Simplicity Bias in 1D Dynamical Systems

Authors:

Abstract: