# SGT Seminar

Alexandr Andoni (Microsoft Research)

Calvin Lab 116

## Learning Polynomials Over Reals

We study the learnability a (sparse) multi-variate polynomial over a real domain. In part #1, we will see an algorithm that suggests that the real equivalent of the "noisy parity problem" is in fact simpler than its boolean version. Specifically, the algorithm approximately reconstructs a k-sparse degree-d polynomial F, given only a set of examples x_i drawn uniformly from the n-dimensional real cube, together with evaluations F(x_i). The result holds even in the "noisy setting". The runtime of our algorithm is polynomial in n,k, and C_d where C_d depends only on degree d. In contrast, in the "boolean version" of this problem, where examples x are drawn from the hypercube, we do not know how to break the n^{\Omega(d)} time barrier, even for k=1 (and some believe it may be impossible to do so).

In part #2, I will pose the question whether "natural, simple algorithms" learn such polynomials efficiently. In particular, we will focus on "neural network + gradient descent" algorithm, which is successfully used in practice for learning tasks, albeit few theoretical guarantees are known to explain such successes. We prove that for a neural network with sufficiently many units (n^{O(d)}), the generic gradient descent algorithm learns any low degree polynomial, assuming we initialize the weights randomly. We leave open the question of whether /sparse/ polynomials can be learned with /small/ neural networks (of complexity as in part 1).

Joint work with Rina Panigrahy, Greg Valiant, and Li Zhang (based on SODA'14 and ICML'14 papers).