On stochastic optimization and the Adam optimizer: Divergence, convergence rates, and acceleration techniques

Prof. Arnulf Jentzen | School of Data Science and Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China | Applied Mathematics: Institute for Analysis and Numerics, Faculty of Mathematics and Computer Science, University of Münster, Germany

Abstract: Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems. In practically relevant training problems, often not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods such as the famous Adam optimizer are applied. In this talk we show that Adam does typically not converge to minimizers or criticial points of the objective function (the function one intends to minimize) but instead converges to zeros of another function, which we refer to as Adam vector field. Moreover, we establish convergence rates in terms of the number of Adam steps and the size of the mini-batch for all strongly convex stochastic optimization problems. Finally, we present acceleration techniques for Adam in the context of deep learning approximations for partial differential equation and optimal control problems. The talk is based on joint works with Steffen Dereich, Thang Do, Robin Graeber, and Adrian Riekert.

About the Speaker

Prof. Arnulf Jentzen

Bio: Arnulf Jentzen (*November 1983) is a professor at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen) (since 2021) and a professor at the University of Münster (since 2019). In 2004 he started his undergraduate studies in mathematics at Goethe University Frankfurt in Germany, in 2007 he received his diploma degree at this university, and in 2009 he completed his PhD in mathematics at this university. The core research topics of his research group are machine learning approximation algorithms, computational stochastics, numerical analysis for high dimensional partial differential equations (PDEs), stochastic analysis, and computational finance. Currently, he serves in the editorial boards of several scientific journals such as the Journal of Machine Learning, the SIAM Journal on Scientific Computing, the SIAM Journal on Numerical Analysis, and the SIAM/ASA Journal on Uncertainty Quantification. His research activities have been recognized through several major awards such as the Felix Klein Prize of the European Mathematical Society (EMS) (2020), an ERC Consolidator Grant from the European Research Council (ERC) (2022), the Joseph F. Traub Prize for Achievement in Information-Based Complexity (IBC) (2022), and a Frontier of Science Award in Mathematics (jointly with Jiequn Han and Weinan E) by the International Congress of Basic Science (ICBS) (2024). Details on the activities of his research group can be found at the webpage www.ajentzen.de

References:

[1] S. Dereich & A. Jentzen, Convergence rates for the Adam optimizer, arXiv:2407.21078 (2024), 43 pages.

[2] S. Dereich, R. Graeber, & A. Jentzen, Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates, arXiv:2407.08100 (2024), 54 pages.

[3] T. Do, A. Jentzen, & A. Riekert, Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks, arXiv:2503.01660 (2025), 42 pages.

[4] A. Jentzen & A. Riekert, Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks, arXiv:2402.05155 (2024), 36 pages, to appear in SIAM/ASA J. Uncertain. Quantif.

[5] S. Dereich, A. Jentzen, & A. Riekert, Sharp higher order convergence rates for the Adam optimizer, arXiv:2504.19426 (2025), 27 pages.

Transfer to calendar

By car

By train

By plane

The H-Bahn (Suspended Monorail System)

Map