On Tuesday, November 15, the “Optimization and neural networks” workshop of the DSAIDIS chair was held. Permanent members and PhD students presented their research work.
Olivier Fercoq
[EN] I will present the ADAM algorithm, which is a famous stochastic gradient method with adaptive learning rate. It is based on exponential moving averages of the stochastic gradients and their squares in order to estimate the first and second moments.
Then I will explain the main ideas of its convergence proof in the case of a convex objective function. The challenges are the following: 1) the estimation of the first moment is biased; 2) the learning rate is a random variable. They are solved by finding terms that telescope almost surely and by using the fact that learning rate is small when the gradient estimate is noisy.
Maxime Lieber
[EN] In this talk, we revisit the tuning of the spectrogram window length, making the window length a continuous parameter optimizable by gradient descent instead of an empirically tuned integer-valued hyperparameter.
We first define two differentiable versions of the STFT w.r.t. the window length, in the case where local bins centers are fixed and independent of the window length parameter, and in the more difficult case where the window length affects the position and number of bins. We then present the smooth optimization of the window length with any standard loss function. We show that this optimization can be of interest not only for any neural network-based inference system, but also for any STFT-based signal processing algorithm. We also show that the window length can not only be fixed and learned offline, but also be adaptive and optimized on the fly. The contribution is mainly theoretical for the moment but the approach is very general and will have a large-scale application in several fields.Enzo Tartaglione
[EN] Recent advances in deep learning optimization showed that, with some a-posteriori information on fully-trained models, it is possible to match the same performance by simply training a subset of their parameters which, it is said, “had won at the lottery of initialization”.
Hicham Janati
The day ended with a discussion on big data and frugal AI.