Quasi-isometric embeddings of vector sets with quantized sub-Gaussian projections
Last January, I was honored to be invited in RWTH Aachen University by Holger Rauhut and Sjoerd Dirksen to give a talk on the general topic of quantized compressed sensing. In particular, I decided to focus my presentation on the quasi-isometric embeddings arising in 1-bit compressed sensing, as developed by a few researchers in this field (e.g., Petros Boufounos, Richard Baraniuk, Yaniv Plan, Roman Vershynin and myself). In comparison with isometric (bi-Lipschitz) embeddings suffering only from a multiplicative distortion, as for the restricted isometry property (RIP) for sparse vectors sets or Johnson-Lindenstrauss Lemma for finite sets, a quasi-isometric embedding $ \bf E: \mathcal K \to \mathcal L$ between two metric spaces $ (\mathcal K, d_{\mathcal K})$ and $ (\mathcal L,d_{\mathcal L})$ is characterized by both a multiplicative and an additive distortion, i.e., for some values $ \Delta_{\oplus},\Delta_{\otimes}>0$, $ \bf E$ respects
$ (1- \Delta_{\otimes}) d_{\mathcal K}(\boldsymbol u, \boldsymbol v) - \Delta_\oplus\leq d_{\mathcal L}(\bf E(\boldsymbol u), \bf E(\boldsymbol v)) \leq (1 + \Delta_{\otimes}) d_{\mathcal K}(\boldsymbol u, \boldsymbol v)+\Delta_\oplus,$
for all $ \boldsymbol u,\boldsymbol v \in \mathcal K$. In the case of 1-bit Compressed Sensing, one observes a quasi-isometric relation between the angular distance of two vectors (or signals) in $ \mathbb R^N$ and the Hamming distance of their 1-bit quantized projections in $ \{\pm 1\}^M$. This mapping is simply obtained by keeping the sign (componentwise) of their multiplication by a
$ d_{\rm ang}(x,y) - \epsilon \leq d_H(\boldsymbol A_{\rm 1bit}(x), \boldsymbol A_{\rm 1bit}(y)) \leq d_{\rm ang}(x,y) + \epsilon.$
For
$ (\sqrt{\frac{2}{\pi}} - \epsilon)\,\|\boldsymbol x - \boldsymbol y\| - c\delta\epsilon \leq \frac{1}{M}\|\boldsymbol A(\boldsymbol x) - \boldsymbol A(\boldsymbol y)\|_1 \leq (\sqrt{\frac{2}{\pi}}+\epsilon)\,\|\boldsymbol x - \boldsymbol y\| + c\delta\epsilon,\qquad(1)$
for some general constants $ C,c>0$.
One observes that the additive distortion of this last relation reads $ \Delta_\oplus = c\delta\epsilon$, i.e., it can be made arbitrarily low with $ \epsilon$! However, directly integrating the quantization in the RIP satisfied by $ \boldsymbol \Phi$ would have rather led to a constant additive distortion, only function of $ \delta$.
Remark: For information, as provided by an anonymous expert reviewer, it happens that a much shorter and elegant proof of this fact exists using the sub-Gaussian properties of the quantized mapping $ \bf A$ (see Appendix A in the associated revised paper). **In that work, the question remained, however, on how to extend (1) for vectors taken in any (bounded) subset $ \mathcal K$ of
$ d_H(\boldsymbol A_{\rm 1bit}(\boldsymbol x), \boldsymbol A_{\rm 1bit}(\boldsymbol y)) = \frac{1}{M} \sum_i \mathbb I[\mathcal E (\boldsymbol \varphi_i^T \boldsymbol x, \boldsymbol \varphi_i^T \boldsymbol y)],$
where $ \boldsymbol \varphi_i$ is the $ i^{\rm th}$ row of $ \boldsymbol \Phi$, $ \mathbb I(A)$ is the indicator set function equals to 1 if $ A$ is non-empty and 0 otherwise, and
$ \mathcal E(a,b) = \{{\rm sign}(a) \neq {\rm sign}(b)\}$.
In words, $ d_H(\boldsymbol A_{\rm 1bit}(\boldsymbol x), \boldsymbol A_{\rm 1bit}(\boldsymbol y))$ counts the number of hyperplanes defined by the normals $ \boldsymbol \varphi_i$ that separate $ \boldsymbol x$ from $ \boldsymbol y$.
Without giving too many details in this post, the work [2] leverages this equivalence to make a connection with tessellation of the $ (N-1)$-sphere where it is shown that the number of such separating random hyperplanes is somehow close (up to some distortions) to the angular distance between the two vecors. They obtain this by “softening” the Hamming distance above, i.e., by introducing, for some $ t \in \mathbb R$, the soft Hamming distance
$ d^t_H(\boldsymbol A_{\rm 1bit}(\boldsymbol x), \boldsymbol A_{\rm 1bit}(\boldsymbol y)) := \frac{1}{M} \sum_i \mathbb I[\mathcal F^t(\boldsymbol \varphi_i^T \boldsymbol x, \boldsymbol \varphi_i^T \boldsymbol y)],$
with now
$ \mathcal F^t(a,b) = \{a > t, b \leq -t\} \cup \{a \leq -t, b > t\}$.
This distance has an interesting continuity property compared to $ d_H = d^0_H$. In particular, if one allows $ t$ to vary, it is continuous up to small ($ \ell_1$) perturbations of $ \boldsymbol x$ and $ \boldsymbol y$.
In our context, considering the quantized mapping $ \bf A$ specified above, we can define the generalized soften distance :
$ \mathcal Q^t(\boldsymbol x,\boldsymbol y) := \tfrac{\delta}{M}\,\sum_{i=1}^M \sum_{k\in\mathbb Z} \mathbb I[\mathcal F^t(\boldsymbol \varphi_i^T \boldsymbol x + \xi_i - k\delta, \boldsymbol \varphi_i^T \boldsymbol y + \xi_i - k\delta)].$
When $ t=0$ we actually recover the distance used in the quasi-isometry (1), i.e.,
$ \mathcal Q^0(\boldsymbol x,\boldsymbol y) = \frac{1}{M} \|\boldsymbol A(\boldsymbol x) - \boldsymbol A(\boldsymbol y)\|_1,$
similarly to the recovering of the Hamming distance in 1-bit case. The interpretation of $ \mathcal Q^t(\boldsymbol x,\boldsymbol y)$ when $ t=0$ is now that we again count the number of hyperplanes defined by the normals $ \boldsymbol \varphi_i$ that separate $ \boldsymbol x$ from $ \boldsymbol y$, but now, for each measurement index $ 1 \leq i \leq M$ this count can be bigger than 1 as we allow multiple parallel hyperplanes for each normal $ \boldsymbol \varphi_i$, all $ \delta$ far apart. This is in connection with the “hyperplane wave partitions’’ defined in [7,8], e.g., for explaining quantization of vector frame coefficients. As explained in the paper, when $ t\neq 0$, then $ \mathcal Q^t$ relaxes (for $ t < 0$) or strengthens (for $ t \geq 0$) the conditions allowing to count a separating hyperplane. As for 1-bit projections, we can show the continuity of $ \mathcal Q^t(\boldsymbol x,\boldsymbol y)$ with respect to small ($ \ell_2$ now) perturbation of $ \boldsymbol x$ and $ \boldsymbol y$, and this fact allows to study the random (sub-Gaussian) concentration of $ \mathcal Q^t(\boldsymbol x,\boldsymbol y)$, i.e., studying it for a coverage of the set of interest, and then extending it to the whole set from this continuity. From this observation, and after a few months of works, I have gathered all these developments in the following paper, now submitted on arxiv:
Briefly, benefiting of the context defined above, this work contains two main results. First, it shows that given a symmetric sub-Gaussian distribution
$ M \geq C \epsilon^{-5} w(\mathcal K)^2$
and if the sensing matrix
$ \boldsymbol x - \boldsymbol y \in C_{K_0} = \{\boldsymbol u \in \mathbb R^N: K_0\|\boldsymbol u\|^2_\infty \leq \|\boldsymbol u\|^2\}$
then, given some constant $ \kappa_{\rm sg}$ depending on the sub-Gaussian distribution $ \varphi$ (with $ \kappa_{\rm sg} = 0$ if $ \varphi$ is Gaussian), $ {(\sqrt{\frac{2}{\pi}} - \epsilon - \frac{\kappa_{\rm sg}}{\sqrt K_0}) \|\boldsymbol x - \boldsymbol y\|- c\epsilon\delta\ \leq\ \frac{1}{M} \|\boldsymbol A(\boldsymbol x) - \boldsymbol A(\boldsymbol y)\|_1\ \leq\ (\sqrt{\frac{2}{\pi}} + \epsilon + \frac{\kappa_{\rm sg}}{\sqrt K_0}) \|\boldsymbol x - \boldsymbol y\|+ c\epsilon\delta.}$ Interestingly, in this quasi-isometric embedding, we notice that the additive distortion is driven by $ \epsilon\delta$ as in (1) while the multiplicative distortion reads now
$ \Delta_{\otimes} = \epsilon + \frac{\kappa_{\rm sg}}{\sqrt K_0}$
In addition to its common dependence in the precision $ \epsilon$, this distortion is also function of the “antisparse” nature of $ \boldsymbol x - \boldsymbol y$. Indeed, if $ \boldsymbol u \in C_{K_0}$ then it cannot be sparser than $ K_0$ with anyway $ C_{K_0} \neq \mathbb R^N \setminus \Sigma_{K_0}$. In other words, when the matrix $ \boldsymbol \Phi$ is non-Gaussian (but sub-Gaussian anyway), the distortion is smaller for “anti-sparse” vector differences. In the particular case where
$ \epsilon = O(M^{-1/4}\,w(\mathcal K)^{1/2})$
for a general subset $ \mathcal K$ and, up to some log factors, as $ \epsilon = M^{-1}$ for sparse vector sets. Open problem: I’m now wondering if the tools and results above could be extended to other quantizers $ \mathcal Q$, such as the universal quantizer defined in [6]. This periodic quantizer has been shown to provide exponentially decaying distortions [6,13] for embedding of sparse vectors, with interesting applications (e.g., information retrieval). Knowing if this holds for other vector sets and for other (sub-Gaussian) sensing matrix is an appealing open question. References: [1] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G Baraniuk. “Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors”. IEEE Transactions on Information Theory, 59(4):2082–2102, 2013. [2] Y. Plan and R. Vershynin. “Dimension reduction by random hyperplane tessellations”. Discrete & Computational Geometry, 51(2):438–461, 2014, Springer. [3] Y. Plan and R. Vershynin. “Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach”. IEEE Transactions on Information Theory, 59(1): 482–494, 2013. [4] Y. Plan and R. Vershynin. “One-bit compressed sensing by linear programming”. Communications on Pure and Applied Mathematics, 66(8):1275–1297, 2013. [5] A. Ai, A. Lapanowski, Y. Plan, and R. Vershynin. “One-bit compressed sensing with non-gaussian measurements”. Linear Algebra and its Applications, 441:222–239, 2014. [6] P. T. Boufounos. Universal rate-efficient scalar quantization. IEEE Trans. Info. Theory, 58(3):1861–1872, March 2012.
[7] V. K Goyal, M. Vetterli, and N. T. Thao. Quantized overcomplete expansions in $ \mathbb R^N$ : Analysis, synthesis, and algorithms. IEEE Trans. Info. Theory, 44(1):16–31, 1998. [8] N. T . Thao and M. Vetterli. “Lower bound on the mean-squared error in oversampled quantization of periodic signals using vector quantization analysis”. IEEE Trans. Info. Theory, 42(2):469–479, March 1996. [9] A. W. Vaart and J. A. Wellner. Weak convergence and empirical processes. Springer, 1996. [10] V. Chandrasekaran and M. I. Jordan. Computational and statistical tradeoffs via convex relaxation. arXiv preprint arXiv:1211.1073, 2012. [11] V. Chandrasekaran, B. Recht, P. A Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational mathematics, 12(6):805–849, 2012. [12] A. Bandeira, D. G Mixon, and B. Recht. Compressive classification and the rare eclipse problem. arXiv preprint arXiv:1404.3203, 2014. [13] P. T. Boufounos and S. Rane. Efficient coding of signal distances using universal quantized embeddings. In Proc. Data Compression Conference (DCC), Snowbird, UT, March 20-22 2013.