Abstract: Neural networks (NNs) with random weights appear in a variety of machine learning applications, perhaps most prominently as initialization of many deep learning algorithms. We take one step closer to their theoretical foundation by addressing the following data separation problem: Under what conditions can a random NN make two classes \(\mathcal X^{-}, \mathcal X^{\plus} \subset \mathbb R^{d}\) (with positive distance) linearly separable? We show that a two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. Crucially, the number of required neurons is explicitly linked to geometric properties of the underlying sets \(\mathcal X^{-}\) and \(\mathcal X^{\plus}\). This instance-specific viewpoint allows us to overcome the usual curse of dimensionality (exponential width of the layers) in non-pathological situations where the data carries low-complexity structure. The key ingredient to make this intuition precise is a geometric notion of complexity (based on the Gaussian mean width), which leads to sound and informative separation guarantees. On a technical level, our proof strategy departs from the standard statistical learning approach, and instead relies on tools from convex geometry and high-dimensional probability. Besides the mathe- matical results, we also discuss several connections to machine learning, such as memorization, generalization, and adversarial robustness.
The related abstract is a very condensed overview of our recent findings provided in this preprint. For more details, possible refinements, related literature, and proofs, we refer to this article.