Substantiation of the backpropagation technique via the Hamilton—Pontryagin formalism for training nonconvex nonsmooth neural networks

Authors

  • V.I. Norkin V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine, Kyiv NTU of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

DOI:

https://doi.org/10.15407/dopovidi2019.12.019

Keywords:

deep learning, machine learning, multilayer neural networks, nonsmooth nonconvex optimization, stochastic generalized gradient, stochastic optimization

Abstract

The paper observes the similarity between the stochastic optimal control over discrete dynamical systems and the lear ning multilayer neural networks. It focuses on contemporary deep networks with nonconvex nonsmooth loss and activation functions. The machine learning problems are treated as nonconvex nonsmooth stochastic optimization ones. As a model of nonsmooth nonconvex dependences, the so-called generalized differentiable functions are used. A method for calculating the stochastic generalized gradients of a learning quality functional for such systems is substantiated basing on the Hamilton—Pontryagin formalism. This method extends a well-known “backpropagation” machine learning technique to nonconvex nonsmooth networks. Stochastic generalized gradient learning algorithms are extended for training nonconvex nonsmooth neural networks.

Downloads

Download data is not yet available.

References

Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep learning. Cambridge: The MIT Press. Retrieved from http://www.deeplearningbook.org

Bottou, L., Curtisy, F. E. & Nocedalz, J. (2018). Optimization methods for large-scale machine learning. SIAM Rev., 60, No. 2, pp. 223-311. Doi: https://doi.org/10.1137/16M1080173

Newton, D., Yousefian, F. & Pasupathy, R. (2018). Stochastic gradient descent: recent trends. INFORMS TutORials in Operations Research, pp. 193-220. Doi: https://doi.org/10.1287/educ.2018.0191

Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, pp. 533-536. Doi: https://doi.org/10.1038/323533a0

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, pp. 85-117. Doi: https://doi.org/10.1016/j.neunet.2014.09.003

Davis, D., Drusvyatskiy, D., Kakade, S. & Lee, J. D. (2019). Stochastic subgradient method converges on tame functions. Found. Comput. Math., pp. 1-36. Doi: https://doi.org/10.1007/s10208-018-09409-5

Clarke, F. H. (1990). Optimization and nonsmooth analysis. Classics in Applied Mathematics, Vol. 5. 2nd ed. Philadelphia, PA: SIAM. Doi: https://doi.org/10.1137/1.9781611971309

Norkin, V.I. (1980). Generalized differentiable functions. Cybernetics, 16, No. 1, pp. 10-12. Doi: https://doi.org/10.1007/BF01099354

Mikhalevich, V. S., Gupal, A. M. & Norkin, V. I. (1987). Methods of nonconvex optimization. Moscow: Nauka (in Russian).

Norkin, V. I. (1986). Stochastic generalized-differentiable functions in the problem of nonconvex nonsmooth stochastic optimization. Cybernetics, 22, No. 6, pp. 804-809. Doi: https://doi.org/10.1007/BF01068698

Bryson, A. E. & Ho, Y-C. (1969). Applied optimal control: optimization, estimation, and control. Waltham: Blaisdell Publ. Co.

Ermoliev, Y. M. (1976). Methods of stochastic programming. Moscow: Nauka (in Russian).

Norkin V. I. (2019). Generalized gradients in problems of dynamic optimization, optimal control, and machine learning. Preprint. V.M. Glushkov Institute of Cybernetics of the National Academy of Sciences of Ukraine, Kyiv. Retrieved from http://www.optimization-online.org/DB_HTML/2019/09/7374.html

Ermol’ev, Yu. M. & Norkin, V. I. (1998). Stochastic generalized gradient method for solving nonconvex nonsmooth stochastic optimization problems. Cybern. Syst. Anal., 34, No. 2, pp. 196-215. Doi: https://doi.org/10.1007/BF02742069

Ermoliev, Y. M. & Norkin, V. I. (2003). Solution of nonconvex nonsmooth stochastic optimization problems. Cybern. Syst. Anal., 39, No. 5, pp. 701-715. Doi: https://doi.org/10.1023/B:CASA.0000012091.84864.65

Downloads

Published

24.04.2024

How to Cite

Norkin, V. (2024). Substantiation of the backpropagation technique via the Hamilton—Pontryagin formalism for training nonconvex nonsmooth neural networks . Reports of the National Academy of Sciences of Ukraine, (12), 19–26. https://doi.org/10.15407/dopovidi2019.12.019

Issue

Section

Information Science and Cybernetics