Umut Şimşekli (Télécom Paris), Levent Sagun (EPFL) and Mert Gürbüzbalaban (Rutgers University)’s paper entitled: “A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks” received the “best paper honorable mention” award at ICML 2019.
The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often con-sidered to be Gaussian in the large data regime by assuming that theclassicalcentral limittheorem (CLT) kicks in. This assumption is often made for mathematical convenience, sinceit enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brow-nian motion. We argue that the Gaussianity assumption might fail to hold in deep learningsettings and hence render the Brownian motion-based analyses inappropriate. Inspired bynon-Gaussian natural phenomena, we consider the GN in a more general context and invokethegeneralizedCLT (GCLT), which suggests that the GN converges to aheavy-tailedα-stablerandom variable. Accordingly, we propose to analyze SGD as an SDE driven by a Lévy motion.Such SDEs can incur ‘jumps’, which force the SDEtransitionfrom narrow minima to widerminima, as proven by existing metastability theory. To validate theα-stable assumption, weconduct extensive experiments on common deep learning architectures and show that in allsettings, the GN is highly non-Gaussian and admits heavy-tails. We further investigate thetail behavior in varying network architectures and sizes, loss functions, and datasets. Ourresults open up a different perspective and shed more light on the belief that SGD preferswide minima.