Thursday, July 03, 2008

CS: Toeplitz CS Matrices, Better IHT, ICML 08 papers, Trace Norm Minimization, Shift Invariant Sparse Coding, Matrix Factorization


Deterministic Designs with Deterministic Guarantees: Toeplitz Compressed Sensing Matrices, Sequence Designs and System Identification by Venkatesh Saligrama. The abstract reads:

In this paper we present a new family of discrete sequences having ``random like'' uniformly decaying auto-correlation properties. The new class of infinite length sequences are higher order chirps constructed using irrational numbers. Exploiting results from the theory of continued fractions and diophantine approximations, we show that the class of sequences so formed has the property that the worst-case auto-correlation coefficients for every finite length sequence decays at a polynomial rate. These sequences display doppler immunity as well. We also show that Toeplitz matrices formed from such sequences satisfy restricted-isometry-property (RIP), a concept that has played a central role recently in Compressed Sensing applications. Compressed sensing has conventionally dealt with sensing matrices with arbitrary components. Nevertheless, such arbitrary sensing matrices are not appropriate for linear system identification and one must employ Toeplitz structured sensing matrices. Linear system identification plays a central role in a wide variety of applications such as channel estimation for multipath wireless systems as well as control system applications. Toeplitz matrices are also desirable on account of their filtering structure, which allows for fast implementation together with reduced storage requirements.
Found on Thomas Blumensath's site :

Thomas Blumensath and Mike Davies; Iterative Hard Thresholding for Compressed Sensing. [This revised version is now available with an improvement of sqrt(2) in terms of the required isometry constant. The uniform bounds are now better for Iterative Hard Thresholding than for CoSaMP. The version on arXiv: arXiv:0805.0510v1 [cs.IT] has not yet been updated.]

Jort Gemmeke mentioned to me three papers related to CS featured at ICML 2008 (which starts next week). I have covered them already I think, so I'll just cut and paste the links and abstracts:

  • Multi-Task Compressive Sensing with Dirichlet Process Priors by Yuting Qi, Dehong Liu, David Dunson, Lawrence Carin

    Abstract
    Compressive sensing (CS) is an emerging field that, under appropriate conditions, can significantly reduce the number of measurements required for a given signal. In many applications, one is interested in multiple signals that may be measured in multiple CS-type measurements, where here each signal corresponds to a sensing 'task'. In this paper we propose a novel multitask compressive sensing framework based on a Bayesian formalism, where a Dirichlet process (DP) prior is employed, yielding a principled means of simultaneously inferring the appropriate sharing mechanisms as well as CS inversion for each task. A variational Bayesian (VB) inference algorithm is employed to estimate the full posterior on the model parameters.
  • Compressed Sensing and Bayesian Experimental Design by Matthias W. Seeger, Hannes Nickisch.

    Abstract
    We relate compressed sensing (CS) with Bayesian experimental design and provide
    a novel efficient approximate method for the latter, based on expectation propagation. In a large comparative study about linearly measuring natural images, we show that the simple standard heuristic of measuring wavelet coefficients top-down systematically outperforms CS methods using random measurements; the sequential projection optimisation approach of (Ji & Carin, 2007) performs even worse. We also show that our own approximate Bayesian method is able to learn measurement filters on full images efficiently which outperform the wavelet heuristic. To our knowledge, ours is the first successful attempt at “learning compressed sensing” for images of realistic size. In contrast to common CS methods, our framework is not restricted to sparse signals, but can readily be applied to other notions of signal complexity or noise models. We give concrete ideas how
    our method can be scaled up to large signal representations.

    Also of interest in the following talk: Large Scale Approximate Inference and Experimental Design for Sparse Linear Models by Matthias Seeger.

    Hannes Nickisch has developed FWTN, a Fast Wavelet-Transformation for D dimensional data in L levels. C-code including Matlab MEX file and Matlab Demo-Script.


  • Sparse Bayesian Nonparametric Regression by Francois Caron, Arnaud Doucet.
    Abstract
    One of the most common problems in machine learning and statistics consists of esti-
    mating the mean response X¯ from a vector of observations y assuming y = X¯ + "
    where X is known, ¯ is a vector of parameters of interest and " a vector of stochastic
    errors. We are particularly interested here in the case where the dimension K of ¯ is
    much higher than the dimension of y. We propose some flexible Bayesian models which can yield sparse estimates of ¯. We show that as K ! 1 these models are closely related to a class of Levy processes. Simulations demonstrate that our models outperform significantly a range of popular alternatives.

  • Finally three other papers grabbed my attention, the first two are not direclty CS related:

    Column subset selection, matrix factorization, and eigenvalue optimization by Joel Tropp. The abstract reads:

    Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing QR, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic. This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in a novel approximation algorithm for the (1; 1) norm of a matrix, which is generally NP-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous maxcut semidefinite program.

    and

    Consistency of Trace Norm Minimization by Francis Bach. The abstract reads:
    Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace norm minimization with the square loss. We also provide an adaptive version that is rank consistent even when the necessary condition for the non adaptive version is not fulfilled.
    Finally, of interest with regards to dictionary building:

    Shift Invariant Sparse Coding of Image and Music Data by Morten Mørup, Mikkel Schmidt, Lars Kai Hansen. The abstract reads:

    When analyzing multi-media data such as image and music it is useful to extract higher-level features that constitute prominent signatures of the data. We demonstrate how a 2D shift invariant sparse coding model is capable of extracting such higher level features forming so-called icon alphabets for the data. For image data the model is able to find high-level prominent features while for music the model is able to extract both the harmonic structure of instruments as well as indicate the scores they play. We further demonstrate that non-negativity constraints are useful since they favor part based representation. The success of the model relies in finding a good value for the degree of sparsity. For this, we propose an ‘L-curve’-like argument and use the sparsity parameter that maximizes the curvature in the graph of the residual sum of squares plotted against the number of non-zero elements of the sparse code. Matlab implementation of the algorithm is available for download.
    Non-Negative Shift Invariant Sparse Coding algorithm is here and is featured in the dictionary section of the big picture.

    Also Trac Tran's presentation at Simon Frasier University entitled: Fast Efficient and Practical Algorithms for Compressed Sensing.

    Credit: NASA/JPL-Caltech/University of Arizona/Texas A&M. Peter Pan print as seen from Phoenix, Sol 36.

    1 comment:

    Anonymous said...

    nice post,

    I wish you or Prof.Tropp would have given an intuitive comment about:

    "COLUMN SUBSET SELECTION, MATRIX FACTORIZATION, AND EIGENVALUE OPTIMIZATION"

    It would be nice to know practical significance of the proposed algorithm and potential application.


    Regards,
    Kayhan

    Printfriendly