SPGL1 and TV minimization ?
Recently, I was using the SPGL1 toolbox to recover some “compressed sensed” images. As a reminder, SPGL1 implements the method described in “Probing the Pareto Frontier for basis pursuit solutions” of Michael P. Friedlander and Ewout van den Berg. It solves the Basis Pursuit DeNoise (or $ BP_\sigma$) problem with a error power $ \sigma >0$ $ \min_{u\in\mathbb{R}^N} \|u\|_1\ \textrm{such that}\ \|Au - b\|_2 \leq \sigma,$ where $ A\,\in\,\mathbb{R}^{m\times N}$ is the usual measurement matrix for a measurement vector $ b\in \mathbb{R}^m$, and $ \|u\|_{1}$ and $ \|u\|_{1}$ are the $ \ell_1$ and the $ \ell_2$ norm of the vector $ u\in\mathbb{R}^N$ respectively. In short, as shown by E. Candès, J. Romberg and T. Tao, if $ A$ is well behaved, i.e. if it satisfies the so-called Restricted Isometry Property for any $ 2K$ sparse signals, then the solution of $ BP_\sigma$ approximates (with a controlled error) a $ K$ sparse (or compressible) signal $ v$ such that $ b = Av + n$, where $ n$ is an additive noise vector with power $ \|n\|_2 \leq \sigma$. The reason of this post is the following : I’m wondering if SPGL1 could be “easily” transformed into a solver of the Basis Pursuit with the Total Variation (TV) norm. That is, the minimization problem $ TV_{\sigma}$ $ \min_{u\in\mathbb{R}^N} \|u\|_{TV}\ \textrm{such that}\ \|Au - b\|_2 \leq \sigma,$ where $ \|u\|_{TV} = \|D u\|_1$ with $ (D u)_j = (D_1 u)_j + i\,(D_2 u)_j$ is the $ j$th component of the complex finite difference operator applied on the vectorized image $ u$ of $ N$ pixels (in a set of coordinates $ x_1$ and $ x_2$). I have used here a “complexification” trick putting the finite differences $ D_1$ and $ D_2$ according to the directions $ x_1$ and $ x_2$ in the real part and the imaginary part respectively of the complex operator $ D$. The TV norm of $ u$ is then really the $ \ell_1$ norm of $ Du$. This problem is particularly well designed for the reconstruction of compressed sensed images since most of them are very sparse in the “gradient basis” (see for instance some references about Compressed Sensing for MRI). Minimizing the TV norm, since performed in the spatial domain, is also sometimes more efficient than minimizing the $ \ell_1$ norm is a particular sparsity basis (e.g. 2-D wavelets, curvelets, …). Therefore, I would say that, as for the initial SPGL1 theoretical framework, it could be interesting to study the Pareto frontier related to $ TV_\sigma$, even if the TV norm is now a quasi-norm, i.e. $ \|u\|_{TV} =0$ does not imply $ u =0$ but $ u_j = c$ for a certain constant $ c \in \mathbb{R}$. To explain better that point, let me first summarize the paper of Friedlander and van den Berg quoted above. They proposed to solve $ BP_\sigma$ by solving a LASSO problem (or $ LS_\tau$) regulated by a parameter $ \tau>0$, $ \min_{u\in\mathbb{R}^N} \|Au - b\|_2\ \textrm{such that}\ \|u\|_1 \leq \tau.$ If I’m right, the key idea is that there exists a $ \tau_\sigma$ such that $ LS_{\tau_\sigma}$ is equivalent to $ BP_\sigma$. The problem is thus to assess this point. SPGL1 finds $ \tau_\sigma$ iteratively using the fact that all the $ LS_\tau$ problems define a smooth and decreasing curve of $ \tau$ (the Pareto curve) from the $ \ell_2$ norm of the residual $ r_\tau = b - Au_\tau$, where $ u_\tau$ is the solution of $ LS_{\tau}$. More precisely, the function $ \phi(\tau) \triangleq \|r_\tau\|_2$ is decreasing from $ \phi(0) = \|b\|_2$ to a value $ \tau_{\sigma}>0$ such that $ \phi(\tau_\sigma) = \sigma.$ Interestingly, the derivative $ \phi’(\tau)$ exists on $ {[}0, \tau{]}$ and it is simply equal to $ -\|A^Ty_\tau\|_{\infty}$ with $ y_\tau = r_\tau / \|r_\tau\|_2$. As explained, on the point $ \tau=\tau_\sigma$, the problem $ LS_\tau$ provides the solution to $ BP_\sigma$. But since both $ \phi$ and $ \phi’$ are known, a Newton method on this Pareto curve can then iteratively estimate $ \tau_\sigma$ from the implicit equation $ \phi(\tau_\sigma) = \sigma$. Practically, this is done by solving of an approximate $ LS_\tau$ at each $ \tau$ (and the convergence of the Newton method is still linear). At the end, the whole approach is very efficient for solving high dimensional BPDN problems (such that BPDN for images) and the final computation cost is mainly due to the cost of the forward and transposed multiplication of the matrix/operator $ A$ with vectors. So what happens now if the $ \ell_1$ norm is replaced by the TV norm in this process ? If we switch from $ BP_\sigma$ to $ TV_\sigma$ ? Is there a “SPGL1 way” to solve that ? The function $ \phi$ resulting from such a context would have now the initial point $ \phi^2(0) = \|b\|^2_2 - (b^TA\bf 1)^2/\|A\bf 1\|^2_2$ (with $ \bf 1$ the constant vector) since a zero TV norm means a constant $ u = c \bf 1$ (the value of $ \phi(0)$ arises just from the minimization on $ c$). Notice that if $ A$ is for instance a Gaussian measurement matrix, $ \phi(0)$ will be very close to $ \|b\|_2$ since the expectation value of the average of any row is zero. For the rest, I’m unfortunately not sufficiently familiar with convex optimization theory to deduce what is $ \phi’$ for the TV framework (hum. I should definitely study that). However, for the $ \ell_1$ case, $ \phi(\tau)$ (i.e. $ LS_\tau$) is computed approximately for each $ \tau$. This approximation, which is also iterative, uses a special projection operator to guarantee that the current candidate solution in the iteration remains feasible, i.e. remains in the $ \ell_1$ ball $ \{u : \|u\|_1 \leq \tau\}$. As usual, this projection is accomplished through a Soft Thresholding procedure, i.e. as a solution of the problem $ \min_{u} \|w -u\|^2_2 + \lambda \|u\|_{1}$, where $ w$ is the point to project, and where $ \lambda>0$ is set so that the projection is inside the $ \ell_1$ ball above. For the TV minimization case, the TV ball $ \{u : \|u\|_{TV} \leq \tau\}$ defining the feasible set of the approximate LASSO procedure would possibly generate a projection operator equivalent to the one solving the problem $ \min_{u} \|w -u\|^2_2 + \lambda \|u\|_{TV}$. This is somehow related to one of the lessons provided in the TwIST paper (“A new twIst: two-step iterative shrinkage/thresholding algorithms for image restoration”) of J. Bioucas-Dias and M. Figueiredo about the so-called Moreau function : There is a deep link between some iterative resolutions of a regularized BP problem using a given sparsity metric, e.g. the $ \ell_1$ or the TV norm, and the canonic denoising method of this metric, i.e. when the measurement is the identity operator, giving Soft Thresholding or TV denoising respectively. Thanks to the implementation of Antonin Chambolle (used also by TwIST), this last canonic TV minimization can be computed very quickly. Therefore, if needed, the required projection on the TV ball above can be also inserted in a potential “SPGL1 for TV sparsity problem”. OK… I agree that all that is just a very rough intuition. There is a lot of points to clarify and to develop. However, if you know something about all that (or if you detect that I’m totally wrong), or if you just want to comment this idea, feel free to use the comment box below …