From Classical Statistics to Modern Deep Learning

26May
Seminars and colloquia
Time
Venue
To receive Zoom room links, send an empty email to request.zoom.ox.ml.and.physics [AT] gmail [DOT] com
Online
Speaker(s)

Mikhail Belkin
University of California

Seminar series
Machine learning and physics
For more information contact

Abstract:

Recent empirical successes of deep learning have exposed significant gaps in our fundamental understanding of learning and optimization mechanisms. Modern best practices for model selection are in direct contradiction to the methodologies suggested by classical analyses. Similarly, the efficiency of SGD-based local methods used in training modern models, appeared at odds with the standard intuitions on optimization. First, I will present evidence, empirical and mathematical, that necessitates revisiting classical statistical notions, such as over-fitting. I will continue to discuss the emerging understanding of generalization, and, in particular, the "double descent" risk curve, which extends the classical U-shaped generalization curve beyond the point of interpolation. Second, I will discuss why the landscapes of over-parameterized neural networks are generically never convex, even locally. Instead they satisfy the Polyak-Lojasiewicz (PL) condition across most of the parameter space which presents a powerful framework for optimization in general over-parameterized models and allows SGD-type methods to converge to a global minimum. While our understanding has significantly grown in the last few years, a key piece of the puzzle remains -- how does optimization align with statistics to form the complete mathematical picture of modern ML?

About the Speaker:

Mikhail Belkin is a Professor at Halicioglu Data Science Institute, University of California, San Diego. Prior to that he was a Professor at the Department of Computer Science and Engineering and the Department of Statistics at the Ohio State University. He received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are in the theory and applications of machine learning and data analysis. Some of his well-known work are the widely used Laplacian Eigenmaps, Graph and Manifold Regularization algorithms, which brought ideas from classical differential and spectral geometry to data analysis, as well as Polynomial Learning of Distribution Families, which used semi-algebraic geometry for provable learning of Gaussian mixture distributions. His recent work has been concerned with understanding remarkable statistical phenomena observed in deep learning. One of the key recent findings is the “double descent” risk curve that extends the textbook U-shaped bias-variance trade- off curve beyond the point of interpolation. Mikhail Belkin is a recipient of a NSF Career Award and a number of best paper and other awards. He served on the editorial boards of IEEE Proceedings on Pattern Analysis Machine Intelligence and the Journal of the Machine Learning Research and is currently serving on the editorial board of SIAM Journal on Mathematics of Data Science.