Nuit Blanche community
@Google+(2008) || @Facebook (358) || @Reddit (1836)
Compressive Sensing @LinkedIn (3878)
Advanced Matrix Factorization @Linkedin (1272)
Paris Machine Learning ( MLParis.org )
@Meetup.com (5825 members) || @archives || @LinkedIn (1761) || @Google+(473) ||
@Facebook (327) || @Twitter (1155 followers)
The Big Picture in Compressive Sensing|| Learning Compressive Sensing ||
Advanced Matrix Factorization Jungle Page ||
Highly Technical Reference Pages - Aggregators ||
These Technologies Do Not Exist || CAI: Cable And Igor's Adventures in Matrix Factorization || Search || Reproducible Research page
Friday, July 20, 2007
Our principal result is the discovery of a sharp threshhold ρ∗ ≈ 0.239, so that if ρ < ρ∗ and A is a random m × n encoding matrix of independently chosen standard Gaussians, where m = O(n), then with overwhelming probability over choice of A, for all x ∈ Rn, LP decoding corrects ρm arbitrary errors in the encoding Ax, while decoding can be made to fail if the error rate exceeds ρ∗.
Comparing the results of section 7 with those of section 4, we note that while near-perfect decoding is information theoretically possible for error rates up to a half, LP decoding fails at much lower error rates. It is thus natural to look for other efficient decoding procedures for the case of adversarial error.
We show that by replacing the L1 norm with the Lp norm with p < 1, exact reconstruction is possible with substantially fewer measurements. We give a theorem in this direction, and many numerical examples, both in one complex dimension, and larger scale examples in two real dimensions.
 Loss function semantics.
 Rice Compressed sensing page.
 Object recognition in the cortex
Wednesday, July 18, 2007
SPGL1 is a solver for large-scale sparse reconstruction problems. For a given noise-level
it can solve (BP)minimizex1subject toAx–b2
Alternatively, it can solve the underdetermined Lasso problem
for a given
. SPGL1 relies only on matrix-vector operations Axand ATyand accepts both explicit matrices, and functions that evaluate these products. In addition, SPGL1 supports the complex-variables case, and solves the true complex one-norm regularized problem.
The support for complex variables is handy in the case of noiselets.
Resource: Rice Compressed Sensing Library.
Tuesday, July 17, 2007
thrice (with yet another mirror)
four times (....)
five times (...)
....5 millions times (...)
until you reach the last 10 millionth mirror.
You tell the set of mirrors, on that DMD chip, to display a set of random tilings. That way, a random set of mirrors are shining the incoming light unto the detector.
You do this once with an initial random tiling and obtain your first CS measurement
then you do this again with a third random tiling, this is your third CS measurement
and so on.
The reason this second method works stems from the idea that most natural images are sparse in bases ranging from cosines, wavelets to curvelets (this is also why JPEG does a tremendous job in decreasing the size of most images). Functions that represent random tilings of reflective and non reflective mirrors (0s and 1s) are said to be mathematically "incoherent" with these bases thereby allowing an automatic compression at the detector level (here in the second mode, there is no need for compression with JPEG at the very end since the CS measurements are already compressed version of the image). A computational steps is required to obtain a human viewable image from these CS measurements. That step uses these solvers.
What are the pros and cons of the second option compared to the first one ?
- The overall sensor requires very low power because there is no CPU/GPU/FPGA trying to perform the compression stage at the very end (JPEG).
- The sensor is dedicated to acquiring information. Information processing can be done somewhere else ( think on some other planet)
- Compared to raw images (raster mode output), the information to be transmitted is very much compressed albeit not optimally (compared to JPEG).
- Instead you can spend all your money designing the very best sensitive pixel you want, it may even act as a spectrometer (looking at many different energy bands), radiation detector and so forth.
- Last but not least, the amount of light that goes to the detector is about half of the 10 million mirrors, which is quite high compared to the single mirror exposure in the raster mode (first method). In other words, the signal to noise ratio is pretty high (a good thing) in the CS mode as opposed to the raster mode.
- Faint signals could be submerged in the CS measurements. You first need to have a high signal to noise ratio signal to detect small signals.
- Your sensor has to have a much larger dynamic range than in the raster mode in order for the A/D quantization algorithm to not mess with the CS measurements.
- The number of CS measurements will be higher than the number of bits required to store the JPEG of the raster mode (about 4 to 5 times higher). Then again (and this is a pro-argument, the transformation from raster mode to JPEG is the power hungry stage of your camera and the reason you always need to recharge it, memory on the other hand is cheap.)
Terry Tao provides a much clearer mathematical explanation albeit with no beautifully hand crafted images such as the ones found here. All information about this single pixel camera system can be found at Rice University.
[ Update 2013: Inview Corporation develops single pixel cameras using the intellectual property from Rice University ]
- Shihao Ji, Lawrence Carin,Ya Xue, David Dunson have built a webpage on Bayesian Compressive Sensing and have also released a version 0.1 of their software as featured on Bayesian Compressive Sensing
- SMAI-Thales meeting in October where two talks on Compressed Sensing will be presented in French in Paris.
- The Von Neuman symposium has passed and Muthu has some thoughts on what was interesting.
- Texas A&M will host a conference on Approximation and Learning in High Dimensions on October 19-21, 2007 (Thanks Cable)
I recently did a guest appearance on John Langford's blog on the subject of Machine Learning and Compressed Sensing. Andrew Gelman also talked about Compressed Sensing as a Universal Dimension Reduction technique by Mike Wakin and Rich Baraniuk in his blog (Statistical Modeling, Causal Inference, and Social Science). Overall, the response was very interesting either through personal e-mail or in the comments section. What comes out of this exercise is a list of elements that had been difficult for me to understand when I first looked at it and then communicate:
- There is a confusion that the L1 regularization providing L0 capabilities is the main reason as to why compressed sensing works. This is not truly the case. Historically it has enabled it but it is not required. One can also point to the limits of this statement ( Breakdown Point of Model When the Number of Variables Exceeds the Observations)
- There is an enormous confusion on the reason why Compressed Sensing is "better" than the Johnson-Lindenstrauss Lemma. In short, Compressed Sensing uses the same techniques as JL but Compressed Sensing provides a tighter bound (A simple proof of the restricted isometry property for random matrices) for specific random projections.
- There is the thought that Compressed Sensing is the same as random projections. This is not the case, one could acquire analog signals of Legendre polynomials coefficients and obtain results (provided the Legendre polynomials are incoherent with the image/signal of interest). Also the random projections currently used in the Compressed Sensing literature are coming from a very specific family of distributions. The current random projections used in Machine learning, here for instance, fulfill obviously the JL lemma but not the RIP constraint of Compressed Sensing. Here is a good description by Terry tao on trying to build matrices that have the RIP (or UUP) property.
- I am always eyeing a solution to dimensionality reduction in the context of imagery where a large amount of pixels need to be reduced to smaller set of most important features. This is not the type of problem some other people face. In the statistics and social science world however, datasets generally have fewer data dimension in the first place and so a dimensionality reduction does not bring a tremendous advantage as found in the imagery business.
- In the Compressed Sensing approach, every new measurement is democratically adding information to the scene of interest. This is different than say, PCA where the first few eigenvectors are supposed to bring the most salient features.
- In particular, there are currently no known sparse random projection matrices fulfilling the RIP i.e. taking a random sample of the signal of interest currently require a full sampling, not a sparse sample. The only sparse random projection used in compressed sensing that I know about, eventually uses group testing to get the full information later.
- Compressed Sensing with dictionary of bases is well understood but the new concept of using a similar approach to decompose manifolds clearly requires new or additional wording. It would seem to me that it can be better put by saying that the manifold has a sparse decomposition in the basis of eigenfunctions of the Laplacian operator on that manifold/graph (diffusion geometries and wavelets of Mauro Maggioni, Stephane Lafon an Ronald Coifman).
Finally, there was this issue as to why the image reconstructed from the Rice one-pixel camera is still blurry. This is my abbreviated response to this comment:
With regards to the blurring ( as can be seen here). I agree even with the 40% being blurry but the Rice CS camera is not a purely analog camera: i.e the picture is projected on some Haar basis (every pixel shining is a step function over that area). In effect in order to have good reconstruction, one needs to have good incoherence between the target space and the CS signal space. Haar bases are good at decomposing natural images so I would not expect very good incoherence between the Haar basis and the wavelet basis (used for the reconstruction). If the Rice CS camera were to be replicating diracs instead of Haar functions, you’d [probably] get less blurry images. A similar note was made by Terry Tao in one of the comments in his blog.
Also you have similar blurry reconstructions from live images from the Random Lens Imager at MIT that uses the compressed sensing framework.... one of the reason is that we are not using 3d functions to do the reconstruction as the camera is certainly sensitive to depth as well. Please note the need for Machine Learning techniques to do the calibration of these imagers.
Ever since that comment, I found a mention of a basis that can be designed to be incoherent with the Haar Basis: the noiselet basis of Coifman, Geshwind and Meyer  for which I can find no public paper describing their constructions other than that of Romberg and Candes. It may be interesting to see how the reconstruction mentioned above fares with this noiselet basis. It would also be judicious to take photos of brownian or chaotic (high dimensional) systems so that there is a better public understanding of the connection between the sampling rate and reconstruction requirements. Similarly, one could play with the compressed sensing imaging of a, rare to find in Texas , Romanesco. Plain Broccolis should do. One could also try to make the connection between the fact that in order for Compressed sensing to work well there is a need for the distribution of projection coefficients to follow a power law and the fact that some objects (like the romanesco) have features (curvelet-like) that follow a power law. Obviously this could be done without the one-pixel camera but a real experiment always trumps a computer one.
Reference:  R. Coifman, F. Geshwind, and Y. Meyer. Noiselets. Appl. Comp. Harmonic Analysis, 10:27–44, 2001
Friday, July 13, 2007
In the search for the Tenacious, there were several sensors used at different times:
- Mark One Eyeball from Coast Guards or from some private parties or from onlookers from the coast
- sensors used by the Coast Guard in Planes and Boats
- sensors (Radar, visual, IR, multispectral) from satellites or high altitude planes
- webcams looking at the SF port and bay.
Each and every one of these sensors give some information about their field of view but they are limited by their capabilities. The information from the sensor is dependent on its resolution and other elements. While the issue of resolution is well understood, at least spatially, sensor visibility is dependent on:
- cloud cover (high altitude, satellites), haze (low altitude)
- the calmness of the sea
- the orientation of the sensor (was the object of interest in the sensor cone ?)
- the ability of the sensor to discriminate the target of interest from the background (signature of the target)
- the size of the target (are we looking for a full boat or debris ?)
In the aerospace business, some of us use software like STK that provides different modules in order to schedule and understand information about specific satellite trajectories and so forth. It may be a good add-on to the current SAROPS capabilities in terms of quantifying the field of view.
But the main issue is really about building the right probability distribution as the search goes on and how one can add any heterogenous information into a coherent view of the search.
Time is also a variable that becomes more and more important as the search goes. In particular it is important to figure out the ability to do data fusion with time stamped data. One can see in this presentation, that while the search grid is regular, one can see some elements drifting out of the field of view as the search is underway. So the issue is really about quantifying data fusion with sensors input as well as maritime currents and provide a probability of escaping the search grid. SAROPS already does some of this, but I am not sure the timing element of the actual search (made by CG planes, boat) is entered in the software as the search go on. It was difficult for us to get back that timing from the search effort (it was rightfully not their priority) and one simply wonders if this is an input to SAROPS when iterating on the first empty searches. If one thinks along the lines of the 8000 containers scenario, this is important as it has been shown that some of these containers have different lifespan at sea level and right under the surface. In this case, the correlation between time stamped sensor outputs become central as a submerged but within a few feet underwater containers may not be viewable from specific sensors (but would remain dangerous to navigation). Also this is not because we did not see anything on the second path at the same location (provided no current) that the object is not here anymore, rather the sensor did not detect it. In the illustration below one can see the different targets found by the Radarsat/John Hopkins team for the Tenacious. Without time stamp it is nearly impossible to make a correlation between hits on the first and the second satellite path.
The bayesian framework seems to have already been adopted by SAROPS and previous versions. It may need some additional capabilities to take into account most the issues mentioned above (sensor network or EPH). In either case, a challenge of some kind, with real data might be a way to advance the current state of the art.
Thursday, July 12, 2007
In a previous entry, I mentioned the most recent Compressed Sensing Hardware implementations I knew about. According to the recent presentation by Justin Romberg at the Short Course on Sparse Representations and High Dimensional Geometry at IPAM, it looks as though I missed two other implementations as described by Romberg in his talk:
* a Hyperspectral imager at Yale where algorithms of Ronald Coifman and Mauro Maggioni are being used against data gathered from a Texas Instrument DMD and a tuned laser light. You can see in the abstract of the article where a feature recognition is doing better when using random projection for hyperspectral imagery:
This leads us to ask how discriminatory spectral features should be selected. The features in previous work on cancer spectroscopy have been chosen according to heuristics. We use the “best basis” algorithm to select a Haar wavelet packet basis which is optimal for the discrimination task at hand. These provide interpretable spectral features consisting of contiguous wavelength bands. However they are outperformed by features which use information from all parts of the spectrum, combined linearly at random.
Hyperspectral imagery has a lot of potential because most of the data gathered is redundant in the first place. As far I can tell, in this example, the large dataset is compressed using random projections and they are then using machine learning techniques (diffusion methods/maps) in order to figure out the neighborhood of a sample given labeled training sets. More information can be found in the presentation of Coifman at IMA.
* an Analog imager at Georgia Tech but I could not find a trace of it for the moment.
The analog route is a fascinating one for two reasons:
- one can use random projections but one can also use other bases. It used to be that you could find analog sensor that were decomposing data directly in their fourier spectrum.
- analog used to be the way to go before the digital revolution and so there is a tremendous knowledge in acquiring analog signals. Compressed sensing opens the door to using these systems again.
Monday, July 09, 2007
Adding Search and Rescue Capabilities (part I): Using Hyperspectral and Multispectral Cameras to Search for Non-Evading Targets
- the boat and its wake span a large part of the pixel and using a few bands, one can see a large difference between a man made object and the sea. In other words, the underlying scene is very sparse and one in effect detect very rapidly interesting artifacts. This a little bit like superresolution.
- In some cameras like Hyperion on EO-1 or Meris (Envisat) there are 250 spectral channels. Even if the spatial resolution is coarse, we are bound to see something different when using 250 bands (as opposed to the traditional three color bands) especially against a very uniform background (sea). Techniques such as the ones developed by Mauro Maggioni and Ronald Coifman should be evaluated for that purpose.
Recall that the idea is to produce a map of what is potentially interesting, not an exact match of where that target is. A second step dealing with data fusion is responsible for eliminating the false positives given information from other sensors. With the help of Lawrence Ong, Sorin Popescu and I showed that you could see boats with Landsat 7. This is new but falls into the first category highlighted above. The second category has not been investigated as far as I know: Maybe it should. There are three categories of targets/signatures that should be investigated:
- During the Tenacious search, a false positive was detected in the form of a sighting of a green dye. These dye are generally part of "distress kits" and used whenever a boat want to make it clear it has problem. While it was a false positive for other reasons, I had a discussion with the EO-1 folks (at JPL and Goddard) who mentioned that maybe producing ground truth data with green dye and Hyperion could probably lead to having a similar capability than the one we currently have for detecting volcanoes. In other words, produce a calibration formula to be fed to EO-1 so that in the future, its autonomous detection capability can provide data to the ground that this green dye has been detected over a specific area. Since one can schedule imagery on EO-1 and figure out Envisat data gathering cycle, this could be done as a small private endeavor.
- Another signature of interest is that of the boat as produced on the camera. If it is a boat or a plane, it is very likely that these have been imaged before by the same camera over the same harbour or airport at some other time. But for the latter, a signature is not really that important per se. A large signal over background noise on some channels should be enough to find that boat/plane. In the case of the plane, the signature may be interesting as the background is generally cluttered.
- In order to verify the ability to find current boats at sea, one could try to locate the boats currently involved in much advertized journeys or races. One could find out from the current stock of envisat and eo-1 photos whether boats like the Schooner Anne can be located. That boat is part of a 1000 days at sea journey. They have a map of their location day after day. The boat is a schooner (or about 120 feet large).
Another item that would have sped up the search is the ability to query simultaneously different databases on the availability of hyperspectral or multispectral images from different platforms. Either USGS or the ESA platforms are very nice, but making it into one search would have been a nice time saver. I am also pretty sure that there are other Earth Observation platforms from Asia (India in particular) that could have been used, provided I knew about them. Yet I cannot find anywhere on the web a catalog of civilian hyperspectral or multispectral imagers on current satellites.
Finally, let us recall that doing this can help us locate hard cases like the Tenacious but it may also help us in a totally different endeavor. As one can see from the extraordinary effort of the Coast Guards for the Tenacious, one boat can consume a large amount of man power. Let us imagine a case where you have to do the tracking of 8000 targets lost at sea.
In the search for Cessna N2700Q, the Civil Air Patrol tried the new ARCHER system without success on that search. And it looks like this is not happening only for this search as some people are doubting its capability for Search and Rescue Operations.
As indicated by Bradley,
CAP forum posts indicate ARCHERThis is the same problem that arises for EO-1, the swath of interest is generally very narrow compared to the size of the problem. We should probably think of a way of integrating Compressed Sensing into current hyperspectral imagery to increase the field of view. Let us recall that one of the reason this would be interesting is that these systems are there to point out major differences from the background, they are not there to produce very nice imagery.
requires a very narrow search area to be of much use. Our problem is that we're not sure where this Cessna pilot went after he dropped
below radar (N34° 48' 7" , W111° 56' 52").
If any of those items are of interest to you please contact me. I am particularly interested in people (from Asia, Europe or the U.S.) that can have direct access to this type of imagery so we can test some of what is said in this entry.
[ Si vous pensez que ce sujet est important et qu'il doit etre etudie, je serais tres heureux de pouvoir vous aider. N'hesitez pas a me contacter ]
Wednesday, July 04, 2007
Tuesday, July 03, 2007
- Paul de Faget de Casteljau, Ingenieur chez Citroen, issu de Normale Sup
- Pierre Bezier, Ingenieur chez Renault, issu de l'ENSAM
- Adrien-Marie Legendre professeur a Polytechnique
- Jean Morlet (qui vient de deceder), geophysicien a ELF, issu de Polytechnique,
- Yousef Saad, francophone qui est passe par le cycle francais a l'Universite de Grenoble
"...The inventors were/are almost all academic mathematicians
- Most were extremely eminent
- Their great discoveries came at all ages
- About half had major involvements with government or industry (That’s big industry—AT&T, IBM, Boeing, etc.— and big government labs like Argonne, Harwell, NPL)
- Most were seriously involved with applications
- It’s hard to disentangle the effects of WWII.."
- Legendre, il y a deux siecles
- Bezier, De CastelJau et Morlet qui ont fait partie de grandes entreprises francaises apres la deuxieme guerre mondiale. Ils sont tous decedes.
- Saad n'est pas francais (il est etranger dans le sens ou il n'a pas fait sa scolarite secondaire en France et donc n'a pu etre detecte par le systeme des prepas) mais il est passe par le systeme universitaire francais pour continuer sa carriere aux U.S. Candes fait sa carriere aux U.S. Mallat a evolue aux U.S. et est revenu en france et a cree une entreprise qui utilise la technologie qu'il a develope pendant ses recherches les plus recentes. (pour la petite histoire, Mallat relate son histoire pour le developement de technologies innovates en France dans : Tribune Libre sur la Recherche et l'Innovation, Gazette des Mathematiciens of the SMF, no. 121, July 2009. pdf)
Morlet's scientific vision was decisive for the success of wavelet analysis. But as Pierre Goupillaud is telling us, this had no effect on Morlet's career at the Elf-Aquitaine company. In fact Morlet's only reward for years of perseverance and creativity in producing this extraordinary tool was an early retirement.
Tamás Sarlós describes in this paper the use of random projections through the Fast JL transform so that one can reduce the size of very large matrices in order to compute their SVD. A related approach was taken by Thomas Strohmer and Roman Vershynin  to build a ranodmized solver with exponential convergence. It sounds like as if there is a potential for a Random Projection LAPACK (RP-LAPACK) but I wonder how these transformations will fare when confronted with Non-normal matrices or discretization of Non-normal operators. In this context, Non-normality has nothing to do with probability distributions but rather a way to describe Non hermitian matrices (or matrices/operators that do not commute with their transpose). In the case of high non-normality, these matrices have very wide spectral portraits. Corollary 4 of Sarlos' paper would point to the potential blow-up in the case of highly non-normal matrices, a situation not unlike the smashed filter case, where noisy data can overcome the tight bound on the projection in the lower dimension random space (theorem 1 of the smashed filter paper). High nonnormality occur whenever there exists a strong coupling between two or more phenomena, giving rise to high spectral instabilities. Examples include linear transport theory for neutrons but other applications where non-normality shows up are listed here. Using these matrices would help in figuring out how bad the problem is.
On a different note, Nick Trefethen pointed out that both some Random matrices and Toeplitz matrices have large pseudo-spectra:
The eigenvalues of finite banded Toeplitz matrices lie on curves in the complex plane that have been characterized by Schmidt and Spitzer [SS60]. In contrast, the spectrum of the corresponding infinite dimensional Toeplitz operator is the set of points in the complex plane that a(T) encloses with non-zero winding number; see Theorem 1.17 of [BS99]. The limit of the spectrum of banded Toeplitz matrices as the dimension goes to infinity is thus typically very different from the spectrum of the limit.
This uncomfortable situation can be resolved by considering the behavior of pseudospectra. Though the eigenvalues of the finite Toeplitz matrices may fall on curves in the complex plane, Landau [Lan75], Reichel and Trefethen [RT92b], and Böttcher [Böt94] have observed that the resolvent norm ||(zI-TN)-1|| grows exponentially in the matrix dimension for all z in the interior of the spectrum of the corresponding infinite dimensional operator. (As a result, it is difficult to accurately compute eigenvalues of non-symmetric banded Toeplitz matrices even for matrices of relatively modest dimensions, and this signals a warning against basing analysis of finite Toeplitz matrices based on eigenvalues alone.) Furthermore the above cited authors also concluded that the pseudospectra of TN converge to the pseudospectra of T.
Could Pseudo-spectra provide directions on successful implementation of compressive sampling using Toeplitz and Random matrices ? Does the fact that Random Matrices replacing Toeplitz matrices in compressed sensing help in thinking how one can devise a Deterministic Constructions of Compressed Sensing Matrices ? This would not be so outlandish since:
The Restricted Isometry Property is a condition on the spectral norm of the matrices A^T = Phi* Phi[ Update: Terry Tao just wrote to explain the issue with building deterministic UUP matrices ]
On a totally different note, I just found this paper on Identification of Matrices having a Sparse Representation by Holger Rauhut, Gotz Pfander and Jared Tanner. Initially derived for wireless communication inverse problem and sonar, it could readily be applied to neutron transport as well.
 T. Strohmer, R. Vershynin, A randomized solver for linear systems with exponential convergence, RANDOM 2006 (10th International Workshop on Randomization and Computation), Springer Lecture Notes in Computer Science 4110, 499--507. Journal version: A randomized Kaczmarz algorithm with exponential convergence, Journal of Fourier Analysis and Applications, to appear
Monday, July 02, 2007
I don't remember it so clearly described but the updated page on Gradient Projection for SparseReconstruction: Application to Compressed Sensing and Other Inverse Problems by Mario Figueiredo, Robert Nowak, Stephen Wright features the new updated paper on the technique and how it compares with other techniques. Besides explaining the algorithm, the authors must be thanked for making clear the connection between compressed sensing and other techniques such as LASSO that are using the L1 norm and for what purpose. I touched upon the reason for using L1 before but it was not framed in this exceptionally clear "background" discussion. Thank you guys.