13:30 Accueil
13:45 Olivier Fercoq, « On the convergence of the ADAM algorithm »
I will present the ADAM algorithm, which is a famous stochastic gradient method with adaptive learning rate. It is based on exponential moving averages of the stochastic gradients and their squares in order to estimate the first and second moments.
Then I will explain the main ideas of its convergence proof in the case of a convex objective function. The challenges are the following: 1) the estimation of the first moment is biased; 2) the learning rate is a random variable. They are solved by finding terms that telescope almost surely and by using the fact that learning rate is small when the gradient estimate is noisy.14:15 Maxime Leiber, « Differentiable STFT with respect to the window length: optimizing STFT window length by gradient descent »
In this talk, we revisit the tuning of the spectrogram window length, making the window length a continuous parameter optimizable by gradient descent instead of an empirically tuned integer-valued hyperparameter.
We first define two differentiable versions of the STFT w.r.t. the window length, in the case where local bins centers are fixed and independent of the window length parameter, and in the more difficult case where the window length affects the position and number of bins. We then present the smooth optimization of the window length with any standard loss function. We show that this optimization can be of interest not only for any neural network-based inference system, but also for any STFT-based signal processing algorithm. We also show that the window length can not only be fixed and learned offline, but also be adaptive and optimized on the fly. The contribution is mainly theoretical for the moment but the approach is very general and will have a large-scale application in several fields.14:45 Pause
15:00 Enzo Tartaglione, « To the lottery ticket hypothesis and beyond: can we really make training efficient? »
Des progrès récents dans le domaine de l’optimisation de l’apprentissage profond ont montré que, avec certaines informations à posteriori sur des modèles entièrement entraînés, il est possible d’obtenir les mêmes performances en entraînant simplement un sous-ensemble de leurs paramètres qui, dit-on, « ont gagné à la loterie de l’initialisation ».
Une telle découverte a un impact potentiellement élevé, de la théorie aux applications de l’apprentissage profond, notamment du point de vue de la consommation d’énergie et de l’IA frugale. Cependant, toutes les méthodes « efficaces » proposées ne correspondent pas aux performances de l’état de l’art avec une forte sparsité imposée, et reposent sur des modèles avec sparsité sans structure, qui introduisent notoirement des surcharges computationnelles.15:30 Hicham Janati, « Averaging Spatio-temporal Signals using Optimal Transport and Soft Alignments »
16:00 Discussion, « Big data or fugal AI: where can optimization techniques help? »
16:40 Fin de l’atelier
Cet événement est destiné aux partenaires de la chaire uniquement. Il sera disponible sur place ou en ligne.