Thursday, May 28, 2009

CS: Topics in Compressed Sensing, GraDes and other ICML '09 papers

Laurent Jacques let me know that Deanna Needell just released her thesis entitled: "Topics in Compressed Sensing". Congratulations Deanna.

Thanks to Lianlin Li's blog on compressed sensing (in Chinese) I noted that I had forgotten the following interesting and probably important paper. Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property by Rahul Garg and Rohit Khandekar. The abstract reads:

In this paper, we present an algorithm for finding an $s$-sparse vector $x$ that minimizes the {\em square-error} $\| y - \Phi x \|^2$ where $\Phi$ satisfies the {\em restricted isometry property} (RIP). Our algorithm, called {\em {\GraDeS}} (Gradient Descent with Sparsification) starts from an arbitrary $s$-sparse $x$ and iteratively updates it as: $$x \leftarrow H_s\left(x + \frac{1}{\gamma} \cdot \Phi^\t(y - \Phi x)\right)$$ where $\gamma > 1$ is a constant and $H_s$ sets all but largest $s$ coordinates in absolute value to zero.

We show that {\GraDeS}, in constant number of iterations, computes the correct $s$-sparse solution to the system $y=\Phi x$ where $\Phi$ satisfies the condition that the {\em isometric constant} $\delta_{2s} < gamma =" 1$," style="font-weight: bold;">Our Matlab implementation of {\GraDeS} out-performs previously proposed algorithms like Subspace Pursuit, StOMP, OMP, and Lasso by an order of magnitude. Curiously, our experiments also uncovered several cases where L1-regularized regression (Lasso) fails but {\GraDeS} finds the correct solution.
Just because of the potential importance of the results shown in this paper, they justify the need for benchmarks tests as mentioned earlier this week. The ICML conference also gives you the ability to discuss the paper here in discuss.

Which brings me to note that the ICML 2009 abstracts and full papers are out. The conference will take place June 14-18 in Montreal. Other papers from that conference that got my interest include:

Online Dictionary Learning for Sparse Coding by Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro. The abstract reads:
Sparse coding---that is, modelling data vectors as sparse linear combinations of basis elements---is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on learning the basis set, also called dictionary, to adapt it to specific data, an approach that has recently proven to be very effective for signal reconstruction and classification in the audio and image processing domains. This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples. A proof of convergence is presented, along with experiments with natural images demonstrating that it leads to faster performance and better dictionaries than classical batch algorithms for both small and large datasets.

In a related matter, the matrix completion crows might be interested with this: An Accelerated Gradient Method for Trace Norm Minimization by Shuiwang Ji and Jieping Ye. The abstract reads:
We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smoothness nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O(1/sqrt(k)), where k is the iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O(1/k^2) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms.
In a different direction, a manifold based technique enabling the production of a distance on a manifold in Geometry-aware Metric Learning by Zhengdong Lu, Prateek Jain and Inderjit Dhillon. The abstract reads:
In this paper, we introduce a generic framework for semi-supervised kernel learning. Given pairwise (dis-)similarity constraints, we learn a kernel matrix over the data that respects the provided side-information as well as the local geometry of the data. Our framework is based on metric learning methods, where we jointly model the metric/kernel over the data along with the underlying manifold. Furthermore, we show that for some important parameterized forms of the underlying manifold model, we can estimate the model parameters and the kernel matrix efficiently. Our resulting algorithm is able to incorporate local geometry into metric learning task, at the same time it can handle a wide class of constraints and can be applied to various applications. Finally, our algorithm is fast and scalable -- unlike most of the existing methods, it is able to exploit the low dimensional manifold structure and does not require semi-definite programming. We demonstrate wide applicability and effectiveness of our framework by applying to various machine learning tasks such as semi-supervised classification, colored dimensionality reduction, manifold alignment etc. On each of the tasks our method performs competitively or better than the respective state-of-the-art method.
and finally, An Efficient Projection for L1,Infinity Regularization by Ariadna Quattoni, Xavier Carreras, Michael Collins and Trevor Darrell. The abstract reads:
In recent years the L1,Infinity norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the L1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of L1,Infinity regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the L1,Infinity ball. We present an algorithm that works in O(n log n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that L1,Infinity leads to better performance than both L2 and L1 regularization and that it is is effective in discovering jointly sparse solutions.

2 comments:

JackD said...

About the GraDes method proposed in "Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property", I cannot see the difference between this method and the Iterative Hard Thresholding (IHT) by T. Blumensath and M. Davies.

If anybody could help me to understand where the difference is ...

Jack

JackD said...

Sorry, got it. In the way they select the update factor gamma: tuned to the RIP constant. Right? IHT set this constant to 1 (Authors mention however possible extension to other value).

Printfriendly