Substantiation of the backpropagation technique via the Hamilton—Pontryagin formalism for training nonconvex nonsmooth neural networks
DOI:
https://doi.org/10.15407/dopovidi2019.12.019Keywords:
deep learning, machine learning, multilayer neural networks, nonsmooth nonconvex optimization, stochastic generalized gradient, stochastic optimizationAbstract
The paper observes the similarity between the stochastic optimal control over discrete dynamical systems and the lear ning multilayer neural networks. It focuses on contemporary deep networks with nonconvex nonsmooth loss and activation functions. The machine learning problems are treated as nonconvex nonsmooth stochastic optimization ones. As a model of nonsmooth nonconvex dependences, the so-called generalized differentiable functions are used. A method for calculating the stochastic generalized gradients of a learning quality functional for such systems is substantiated basing on the Hamilton—Pontryagin formalism. This method extends a well-known “backpropagation” machine learning technique to nonconvex nonsmooth networks. Stochastic generalized gradient learning algorithms are extended for training nonconvex nonsmooth neural networks.
Downloads
References
Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep learning. Cambridge: The MIT Press. Retrieved from http://www.deeplearningbook.org
Bottou, L., Curtisy, F. E. & Nocedalz, J. (2018). Optimization methods for large-scale machine learning. SIAM Rev., 60, No. 2, pp. 223-311. Doi: https://doi.org/10.1137/16M1080173
Newton, D., Yousefian, F. & Pasupathy, R. (2018). Stochastic gradient descent: recent trends. INFORMS TutORials in Operations Research, pp. 193-220. Doi: https://doi.org/10.1287/educ.2018.0191
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, pp. 533-536. Doi: https://doi.org/10.1038/323533a0
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, pp. 85-117. Doi: https://doi.org/10.1016/j.neunet.2014.09.003
Davis, D., Drusvyatskiy, D., Kakade, S. & Lee, J. D. (2019). Stochastic subgradient method converges on tame functions. Found. Comput. Math., pp. 1-36. Doi: https://doi.org/10.1007/s10208-018-09409-5
Clarke, F. H. (1990). Optimization and nonsmooth analysis. Classics in Applied Mathematics, Vol. 5. 2nd ed. Philadelphia, PA: SIAM. Doi: https://doi.org/10.1137/1.9781611971309
Norkin, V.I. (1980). Generalized differentiable functions. Cybernetics, 16, No. 1, pp. 10-12. Doi: https://doi.org/10.1007/BF01099354
Mikhalevich, V. S., Gupal, A. M. & Norkin, V. I. (1987). Methods of nonconvex optimization. Moscow: Nauka (in Russian).
Norkin, V. I. (1986). Stochastic generalized-differentiable functions in the problem of nonconvex nonsmooth stochastic optimization. Cybernetics, 22, No. 6, pp. 804-809. Doi: https://doi.org/10.1007/BF01068698
Bryson, A. E. & Ho, Y-C. (1969). Applied optimal control: optimization, estimation, and control. Waltham: Blaisdell Publ. Co.
Ermoliev, Y. M. (1976). Methods of stochastic programming. Moscow: Nauka (in Russian).
Norkin V. I. (2019). Generalized gradients in problems of dynamic optimization, optimal control, and machine learning. Preprint. V.M. Glushkov Institute of Cybernetics of the National Academy of Sciences of Ukraine, Kyiv. Retrieved from http://www.optimization-online.org/DB_HTML/2019/09/7374.html
Ermol’ev, Yu. M. & Norkin, V. I. (1998). Stochastic generalized gradient method for solving nonconvex nonsmooth stochastic optimization problems. Cybern. Syst. Anal., 34, No. 2, pp. 196-215. Doi: https://doi.org/10.1007/BF02742069
Ermoliev, Y. M. & Norkin, V. I. (2003). Solution of nonconvex nonsmooth stochastic optimization problems. Cybern. Syst. Anal., 39, No. 5, pp. 701-715. Doi: https://doi.org/10.1023/B:CASA.0000012091.84864.65
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Reports of the National Academy of Sciences of Ukraine
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.