Global Optimality in Matrix and Tensor Factorization, Deep Learning, and Beyond Parts I & II

Jun 23

Thursday, June 23, 2016

9:00 am - 11:50 am
Fitzpatrick Center Schiciano Auditorium


René Vidal Professor, Johns Hopkins University

Abstract: The past few years have seen a dramatic increase in the performance of pattern recognition systems due to the introduction of deep neural networks for representation learning. However, the mathematical reasons for this success re-main elusive. A key challenge is that the problem of learning the parameters of a neural network is a non-convex optimization problem, which makes finding the globally optimal parameters extremely difficult. Building on ideas from convex relaxations of matrix factorizations, in this talk I will present a very general frame-work which allows for the analysis of a wide range of non-convex factorization problems - including matrix factorization, tensor factorization, and deep neural network training formulations. In particular, I will present sufficient conditions under which a local minimum of the non-convex optimization problem is a global mini-mum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Our framework also provides a partial theoretical justification for the increasingly common use of Rectified Linear Units (ReLUs) in deep neural networks and offers guidance on deep network architectures and regularization strategies to facilitate efficient optimization. This is joint work with Benjamin Haeffele.