Summary: Recent developments in compressed sensing, machine learning and dimensionality reduction have reinvigorated interest in the theory and applications of embeddings. Embeddings are transformations of signals and sets of signals that approximately preserve some aspects of the geometry of the set, while reducing the complexity of handling such signals. For example, Johnson-Lindenstrauss (JL) embeddings–one of the earliest and most celebrated embedding constructions–provide tools for significant dimensionality reduction, agnostic to the data. Recent literature has significantly expanded the range of embedding constructions available, often departing from the linearity of JL’s method, to achieve different goals. This recent literature has shown that properly designed embeddings are a powerful tool for efficient information representation.
Our goal with this tutorial is to expose all this body of work, the available solutions, the theoretical underpinnings and practical considerations, as well as the problems still open in the field. We will provide a wide treatment of the topic, drawn both from our extensive work in the area, as well as other literature available in the field. Our objective is also to provide a well-balanced presentation of both theory and practice, guided by intuitive explanations, assuming only minimal background on dimensionality reduction and embedding theory.
More precisely, the tutorial aims to overview the fundamentals of embedding constructions, starting with the foundational work of Johnson and Lindenstrauss, and developing the general framework of randomized constructions. We will discuss different embedding goals, including distance and local geometry preservation, locality sensitive hashing (LSH), kernel machine implementation, and feature quantization, among others. We will also explore different applications in, e.g., classification, detection, and feature compression. Although our focus is on randomized, universal and data-agnostic constructions, we will also explore JL-style constructions that learn from the data geometry, to further improve performance at the expense of universality. The tutorial will conclude with discussion of the open problems in the area, future trends and promising research directions.