## Tuesday, May 31, 2016

### Review: RandNLA: Randomized Numerical Linear Algebra

If you've been reading Nuit Blanche for a while, you have heard of RandNLA as there have been about 75 blog entries on the subject. It looks like a review was overdue to present the matter to a wider audience. Petros and Michael who were among the ones starting the field, just did that in a review of the CACM. Here is it:

It starts like this:

MATRICES ARE UBIQUITOUS in computer science, statistics, and applied mathematics. An m×n matrix can encode information about m objects (each described by n features), or the behavior of a discretized differential operator on a finite element mesh; an n× n positive-definite matrix can encode the correlations between all pairs of n objects, or the edge-connectivity between all pairs of nodes in a social network; and so on. Motivated largely by technological developments that generate extremely large scientific and Internet datasets, recent years have witnessed exciting developments in the theory and practice of matrix algorithms. Particularly remarkable is the use of randomization —typically assumed to be a property of the input data due to, for example, noise in the data generation mechanisms—as an algorithmic or computational resource for the development of improved algorithms for fundamental matrix problems such as matrix multiplication, least-squares (LS) approximation, low-rank matrix approximation, and Laplacian-based linear equation solvers......

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Phase Retrieval: Robust phase retrieval with the swept approximate message passing (prSAMP) algorithm / On The 2D Phase Retrieval Problem

Today, we have two preprints on phase retrieval. The first one investigates the use of an AMP solver with a binary measurmeent matrix while the second looks at how a 2D phase retrieval problem can be solved with a analytical approach when it is unique (with no sparsity assumption).

Robust phase retrieval with the swept approximate message passing (prSAMP) algorithm by Boshra Rajaei, Sylvain Gigan, Florent Krzakala, Laurent Daudet

In phase retrieval, the goal is to recover a complex signal from the magnitude of its linear measurements. While many well-known algorithms guarantee deterministic recovery of the unknown signal using i.i.d. random measurement matrices, they suffer serious convergence issues some ill-conditioned matrices. As an example, this happens in optical imagers using binary intensity-only spatial light modulators to shape the input wavefront. The problem of ill-conditioned measurement matrices has also been a topic of interest for compressed sensing researchers during the past decade. In this paper, using recent advances in generic compressed sensing, we propose a new phase retrieval algorithm that well-adopts for both Gaussian i.i.d. and binary matrices using both sparse and dense input signals. This algorithm is also robust to the strong noise levels found in some imaging applications.

On The 2D Phase Retrieval Problem by Dani Kogan, Yonina C. Eldar, Dan Oron

The recovery of a signal from the magnitude of its Fourier transform, also known as phase retrieval, is of fundamental importance in many scientific fields. It is well known that due to the loss of Fourier phase the problem in 1D is ill-posed. Without further constraints, there is no unique solution to the problem. In contrast, uniqueness up to trivial ambiguities very often exists in higher dimensions, with mild constraints on the input. In this paper we focus on the 2D phase retrieval problem and provide insight into this uniqueness property by exploring the connection between the 2D and 1D formulations. In particular, we show that 2D phase retrieval can be cast as a 1D problem with additional constraints, which limit the solution space. We then prove that only one additional constraint is sufficient to reduce the many feasible solutions in the 1D setting to a unique solution for almost all signals. These results allow to obtain an analytical approach (with combinatorial complexity) to solve the 2D phase retrieval problem when it is unique.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, May 30, 2016

### Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks

Chris just sent me the following:
Hi Igor-

In earlier communications you've been interested in the intersections of compressed sensing and neural computation.  We've just released a preprint that adds one more to the list: low rank factorization.  The preprint below extends our earlier work on determining the memory capacity or recurrent neural networks (RNNs).  RNNs are being used as layers in deep networks when they want to introduce some memory to handle time-varying inputs.

Using random matrix techniques, we can show the memory capacity bounds on one type of RNN (linear echo state networks) with multidimensional inputs.  Those inputs can be sparse in a basis, or have dependencies that result in a low-rank matrix (with no sparsity).  In either case, we show that the network size must scale linearly with the information rate in the data, resulting in networks that can be much smaller than the dimension of the input being remembered.

http://arxiv.org/abs/1605.08346

regards,
chris
Thanks And all this with phase transition diagrams, woohoo !

Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks by  Adam Charles, Dong Yin, Christopher Rozell

Recurrent neural networks (RNNs) have drawn interest from machine learning researchers because of their effectiveness at preserving past inputs for time-varying data processing tasks. To understand the success and limitations of RNNs, it is critical that we advance our analysis of their fundamental memory properties. We focus on echo state networks (ESNs), which are RNNs with simple memoryless nodes and random connectivity. In most existing analyses, the short-term memory (STM) capacity results conclude that the ESN network size must scale linearly with the input size for unstructured inputs. The main contribution of this paper is to provide general results characterizing the STM capacity for linear ESNs with multidimensional input streams when the inputs have common low-dimensional structure: sparsity in a basis or significant statistical dependence between inputs. In both cases, we show that the number of nodes in the network must scale linearly with the information rate and poly-logarithmically with the ambient input dimension. The analysis relies on advanced applications of random matrix theory and results in explicit non-asymptotic bounds on the recovery error. Taken together, this analysis provides a significant step forward in our understanding of the STM properties in RNNs.

Because somehow these phase diagrams are the great equalizers.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, May 28, 2016

### Saturday Morning Videos: International Conference on Learning Representations (ICLR) 2016, San Juan

Hugo mentioned it on his twitter feed, the videos of ICLRs are out. The papers are listed here.

Opening Remarks

06:43 Opening, Hugo Larochelle

Keynote Talks

Best Paper Awards

Lectures

Credits: NASA/JHUAPL/SwRI

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Saturday Morning Video: Proving them wrong, Space X First-stage landing from the Onoard Camera

After the success of the DCX, I still remember vividly a certain amount of annoyance at this Aerospace MIT professor who testified before Congress on his doubts about SSTO and related concepts (reusable TSTO). As in any other things in life, sometimes the best way to fight something is to simply prove them dead wrong. Evidence #4628,

Space X First-stage landing (congrats Damaris and Andrew)

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, May 27, 2016

### A proposal for a NIPS Workshop or Symposium on "Mapping Machine Learning to Hardware"

So now that the deadline for the papers at NIPS is over, I am interested in organizing a workshop on how Machine Learning is being mapped to Hardware. If you are interested or know somebody who is potentially interested, please get in touch with me and we will improve on that proposal (and yes the title can change too). Here is the first draft:

Dear Colleagues,

Mapping Machine Learning to Hardware

With the recent successes of Deep Learning, we are beginning to see a set of new specialized hardware dedicated to making some of these computations faster, or energy efficient or both. These technologies use either CMOS (CPU, GPUs, FPGAs, ASICs) or exotic technologies (Bio, Memristors, Quantum Chips, Photonics, etc…) as they seek to address a specific trade-off in mapping Machine Learning algorithms to a specific hardware technology.

Conversely there has been quite an effort on the empirical side at devising deep network architectures that can handle binary coefficients so as to be efficiently implementable on low complexity hardware.

A somewhat related issue is the recent interest of the sensing community to map the first layers of sensing hardware with the first layers of models such as deep networks. This approach has the potential of changing the way image reconstruction and signal processing will be performed in the future.

This workshop will bring together researchers at the interface of machine learning, hardware implementation, sensing, physics and biology.

The goals of the workshop are
• to present how machine learning computations and algorithms are mapped and improved as a result of new hardware technologies and architectures
• to evaluate the different trade-offs currently investigated in these approaches
• to understand how sensors may change as a result of this mapping to Machine Learning algorithms.
• to evaluate the constraints put forth on recent deep learning architectures so as to reduce redundancy and enable a simpler mapping between computing hardware and models.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Riemannian stochastic variance reduced gradient on Grassmann manifold - implementation -

Bamdev just sent me the following:

Dear Igor,

I wish to share our recent technical report on "Riemannian stochastic variance reduced gradient on Grassmann manifold", available at http://arxiv.org/abs/1605.07367. In this paper, we extend the Euclidean SVRG algorithm to compact Riemannian manifolds. The results are encouraging.

Additionally, the codes are available at https://bamdevmishra.com/codes/rsvrg/. We also provide a template file, Riemannian_svrg.m, that is compatible with the Manopt toolbox [1].

Regards,
Bamdev
[1] http://manopt.org

Thanks Bamdev ! Here is the paper: Riemannian stochastic variance reduced gradient on Grassmann manifold by Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra

Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of averaging, addition, and subtraction of multiple gradients are addressed with notions like logarithm mapping and parallel translation of vectors on the Grassmann manifold. We present a global convergence analysis of the proposed algorithm with decay step-sizes and a local convergence rate analysis under fixed step-size with some natural assumptions. The proposed algorithm is applied on a number of problems on the Grassmann manifold like principal components analysis, low-rank matrix completion, and the Karcher mean computation. In all these cases, the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, May 26, 2016

### Harvesting Nature: from Computational Imaging to Optical Computing

Today, at the 2016 SIAM Conference on Imaging Science in Albuquerque, Laurent, the CTO and Co-Founder of LightOn will be presenting at the Computational Imaging Systems minisymposium (9:30 AM-11:30 AM, Room:Alvarado Ballroom C).
Among the work Laurent will present, he will mention our earlier Proof of Concept (Random Projections through multiple optical scattering: Approximating kernels at the speed of light,)
I am currently thinking about submitting a workshop or symposium around Machine Learning and Hardware at NIPS. If you are interested or know a similar endeavor, please get in touch and let's make this happen.
This minisymposium emphasizes the interaction between sensing hardware and computational methods in computational imaging systems.  Novel sensing hardware together with algorithms that exploit the unique properties of the data acquired by this hardware has enabled the development of imaging systems with remarkable abilities that could not be achieved via traditional methods. This minisymposium brings together four leaders in the field of computational photography, and provides a cross-section of recent advances in this area.

9:30-9:55 Macroscopic Fourier Ptychography, Oliver Cossairt, Northwestern University,
USA; Jason R. Holloway and Ashok Veeraraghavan, Rice University, USA; Manoj Sharma, Northwestern University, USA; Salman Asif, Rice University, USA; Nathan Matsuda, Northwestern University, USA; Roarke Horstmeyer, California Institute of
Technology, USA

10:00-10:25 Lensless Imaging Ashok Veeraraghavan, Rice University, USA
10:30-10:55 Photon-Efficient Reflectivity and Depth Imaging under Significant Ambient Light Vivek K. Goyal, Boston University, USA
11:00-11:25 Harvesting Nature: from Computational Imaging to Optical Computing
Laurent Daudet, Université Paris-Diderot, France
Thank you Mario for the invitation.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Tuesday, May 24, 2016

### Compressively characterizing high-dimensional entangled states with complementary, random filtering

How is compressive sensing changing the world ? Here is one way: it makes quantum mechanics simpler to understand: Compressively characterizing high-dimensional entangled states with complementary, random filtering by  Gregory A. Howland, Samuel H. Knarr, James Schneeloch, Daniel J. Lum, John C. Howell

The resources needed to conventionally characterize a quantum system are overwhelmingly large for high- dimensional systems. This obstacle may be overcome by abandoning traditional cornerstones of quantum measurement, such as general quantum states, strong projective measurement, and assumption-free characterization. Following this reasoning, we demonstrate an efficient technique for characterizing high-dimensional, spatial entanglement with one set of measurements. We recover sharp distributions with local, random filtering of the same ensemble in momentum followed by position---something the uncertainty principle forbids for projective measurements. Exploiting the expectation that entangled signals are highly correlated, we use fewer than 5,000 measurements to characterize a 65, 536-dimensional state. Finally, we use entropic inequalities to witness entanglement without a density matrix. Our method represents the sea change unfolding in quantum measurement where methods influenced by the information theory and signal-processing communities replace unscalable, brute-force techniques---a progression previously followed by classical sensing.

Previously:
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Thesis: Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks - implementation -

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks by  Philipp Gysel

Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference. We present Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. The tool fine-tunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.

The page for the Ristretto project is at: http://lepsucd.com/?page_id=621
From the page:

Ristretto is an automated CNN-approximation tool which condenses 32-bit floating point networks. Ristretto is an extention of Caffe and allows to test, train and finetune networks with limited numerical precision.

# Ristretto In a Minute

• Ristretto Tool: The Ristretto tool performs automatic network quantization and scoring, using different bit-widths for number representation, to find a good balance between compression rate and network accuracy.
• Ristretto Layers: Ristretto reimplements Caffe-layers with quantized numbers.
• Testing and Training: Thanks to Ristretto’s smooth integration into Caffe, network description files can be manually changed to quantize different layers. The bit-width used for different layers as well as other parameters can be set in the network’s prototxt file. This allows to directly test and train condensed networks, without any need of recompilation.

# Approximation Schemes

Ristretto allows for three different quantization strategies to approximate Convolutional Neural Networks:
• Dynamic Fixed Point: A modified fixed-point format with more flexibility.
• Mini Floating Point: Bit-width reduced floating point numbers.
• Power-of-two parameters: Layers with power-of-two parameters don’t need any multipliers, when implemented in hardware.

# Cite us

Our approximation framework was presented in an extended abstract at ICLR’16. Check out our poster. All results can be reproduced with our code on Github. If Ristretto helps your research project, please cite us:
1. @article{gysel2016hardware,
2. title={Hardware-oriented Approximation of Convolutional Neural Networks},
3. author={Gysel, Philipp and Motamedi, Mohammad and Ghiasi, Soheil},
4. journal={arXiv preprint arXiv:1604.03168},
5. year={2016}
6. }

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, May 23, 2016

### The Shallow Side of Deep Learning

On his Twitter feed, Atlas mentioned three preprints: the first allows the building of deeper construction out of kernels (shallow methods). The next one shows how some deep nets can be considered as some sorts of ensemble of shallower nets. Finally, the last one makes the case that  dropout and stochastic depth are somehow equivalent to averaging over shallower networks.

In this paper, we propose a new image representation based on a multilayer kernel machine that performs end-to-end learning. Unlike traditional kernel methods, where the kernel is handcrafted or adapted to data in an unsupervised manner, we learn how to shape the kernel for a supervised prediction problem. We proceed by generalizing convolutional kernel networks, which originally provide unsupervised image representations, and we derive backpropagation rules to optimize model parameters. As a result, we obtain a new type of convolutional neural network with the following properties: (i) at each layer, learning filters is equivalent to optimizing a linear subspace in a reproducing kernel Hilbert space (RKHS), where we project data, (ii) the network may be learned with supervision or without, (iii) the model comes with a natural regularization function (the norm in the RKHS). We show that our method achieves reasonably competitive performance on some standard "deep learning" image classification datasets such as CIFAR-10 and SVHN, and also state-of-the-art results for image super-resolution, demonstrating the applicability of our approach to a large variety of image-related tasks.

Residual Networks are Exponential Ensembles of Relatively Shallow Networks by Andreas Veit, Michael Wilber, Serge Belongie

In this work, we introduce a novel interpretation of residual networks showing they are exponential ensembles. This observation is supported by a large-scale lesion study that demonstrates they behave just like ensembles at test time. Subsequently, we perform an analysis showing these ensembles mostly consist of networks that are each relatively shallow. For example, contrary to our expectations, most of the gradient in a residual network with 110 layers comes from an ensemble of very short networks, i.e., only 10-34 layers deep. This suggests that in addition to describing neural networks in terms of width and depth, there is a third dimension: multiplicity, the size of the implicit ensemble. Ultimately, residual networks do not resolve the vanishing gradient problem by preserving gradient flow throughout the entire depth of the network - rather, they avoid the problem simply by ensembling many short networks together. This insight reveals that depth is still an open research question and invites the exploration of the related notion of multiplicity.

Swapout: Learning an ensemble of deep architectures by  Saurabh Singh, Derek Hoiem, David Forsyth

We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer wider model performs similar to a 1001 layer ResNet model.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, May 21, 2016

### Saturday Morning Video: Solvable Model of Unsupervised Feature Learning, Lenka Zdeborova @ Simons Institute

Lenka who will be at our next Paris Machine Learning meetup on June 8th, was recently speaking at the Simons institute's Random Instances and Phase Transitions workshop within the trimester long Counting Complexity and Phase Transitions program. Here is Solvable Model of Unsupervised Feature Learning by Lenka Zdeborova,

We introduce factorization of certain randomly generated matrices as a probabilistic model for unsupervised feature learning. Using the replica method we obtain an exact closed formula for the minimum mean-squared error and sample complexity for reconstruction of the features from data. An alternative way to derive this formula is via state evolution of the associated approximate message passing. Analyzing the fixed points of this formula we identify a gap between computationally tractable and information theoretically optimal inference, associated to a first order phase transition. The model can serve for benchmarking generic-purpose feature learning algorithms, its analysis provides insight into theoretical understanding of unsupervised learning from data.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, May 20, 2016

### ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels

Yes ! The great convergence in action. Merging sensing with the first layer of a deep learning architecture:

ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels by Huaijin Chen, Suren Jayasuriya, Jiyue Yang, Judy Stephen, Sriram Sivaramakrishnan, Ashok Veeraraghavan, Alyosha Molnar

Deep learning using convolutional neural networks (CNNs) is quickly becoming the state-of-the-art for challenging computer vision applications. However, deep learning's power consumption and bandwidth requirements currently limit its application in embedded and mobile systems with tight energy budgets. In this paper, we explore the energy savings of optically computing the first layer of CNNs. To do so, we utilize bio-inspired Angle Sensitive Pixels (ASPs), custom CMOS diffractive image sensors which act similar to Gabor filter banks in the V1 layer of the human visual cortex. ASPs replace both image sensing and the first layer of a conventional CNN by directly performing optical edge filtering, saving sensing energy, data bandwidth, and CNN FLOPS to compute. Our experimental results (both on synthetic data and a hardware prototype) for a variety of vision tasks such as digit recognition, object recognition, and face identification demonstrate 97% reduction in image sensor power consumption and 90% reduction in data bandwidth from sensor to CPU, while achieving similar performance compared to traditional deep learning pipelines.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, May 19, 2016

### FISTA and "Maximal Sparsity with Deep Networks?", Learning MMSE Optimal Thresholds for FISTA

Ulugbek just sent me the following:

Hi Igor,

I just saw your today’s post on Nuit Blanche on "Maximal Sparsity with Deep Networks?”.

You mentioned both FISTA and AMP, and the number of iterations required for convergence. I generally agree with your observations about studies on FISTA/AMP being hugely insightful. We have been investigating this from our side with Hassan, and I wanted to share a small update (see attached), which will be presented at iTWIST 2016 in Denmark this August. We haven’t uploaded the manuscript to arXiv as iTWIST anyways publishes its proceedings there.

Going from ISTA to FISTA is quite advantageous. We initialize the algorithm with 0 and also set the initial regularizer to 0, i.e. use identity operator instead of a shrinkage (which corresponds to having least-squares estimator) and converge much faster towards a regularizer that promotes sparsity.

Thank you having us all informed here,
Kind Regards,
- Ulugbek
Thanks Ulugbek and  Hassan ! Here is the paper:

Fast iterative shrinkage/thresholding algorithm (FISTA) is one of the most commonly used methods for solving linear inverse problems. In this work, we present a scheme that enables learning of optimal thresholding functions for FISTA from a set of training data. In particular, by relating iterations of FISTA to a deep neural network (DNN), we use the error backpropagation algorithm to find thresholding functions that minimize mean squared error (MSE) of the reconstruction for a given statistical distribution of data. Accordingly, the scheme can be used to computationally obtain MSE optimal variant of FISTA for performing statistical estimation.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Maximal Sparsity with Deep Networks?

Back on the longest night of 2015 (see The Great Convergence in Action: Learning optimal nonlinearities for iterative thresholding algorithms ), I asked if Winter was coming to applied mathematics as one can witness a convergence of people looking at changing parameters of iterative solvers with Deep Learning techniques. We also have mentioned the following parallel here on Nuit Blanche for a while. Anyway, it is put in better words here:

....Although seemingly unrelated at rst glance, the layers of a deep neural network (DNN) can be viewed as iterations of some algorithm that have been unfolded into a network structure (Gregor and LeCun, 2010; Hershey et al., 2014). In particular, iterative thresholding approaches such as those mentioned above typically involve an update rule comprised of a xed, linear filter followed by a non-linear activation function that promotes sparsity. Consequently, algorithm execution can be interpreted as passing an input through an extremely deep network with constant layer weights (dependent on) at every layer....
Here is the paper. My remarks are below:Maximal Sparsity with Deep Networks? by Bo Xin, Yizhou Wang, Wen Gao, David Wipf
The iterations of many sparse estimation algorithms are comprised of a fixed linear filter cascaded with a thresholding nonlinearity, which collectively resemble a typical neural network layer. Consequently, a lengthy sequence of algorithm iterations can be viewed as a deep network with shared, hand-crafted layer weights. It is therefore quite natural to examine the degree to which a learned network model might act as a viable surrogate for traditional sparse estimation in domains where ample training data is available. While the possibility of a reduced computational budget is readily apparent when a ceiling is imposed on the number of layers, our work primarily focuses on estimation accuracy. In particular, it is well-known that when a signal dictionary has coherent columns, as quantified by a large RIP constant, then most tractable iterative algorithms are unable to find maximally sparse representations. In contrast, we demonstrate both theoretically and empirically the potential for a trained deep network to recover minimal $\ell_0$-norm representations in regimes where existing methods fail. The resulting system is deployed on a practical photometric stereo estimation problem, where the goal is to remove sparse outliers that can disrupt the estimation of surface normals from a 3D scene.
It's fascinating. Three items:
• The paper ought to mention similar efforts such as the one mentioned earlier here (
• While ISTA and related algorothms are interesting, AMP solvers that have similar structure ought to be investigated as it is expected that the number of iterations is much smaller than that of traditional FISTA et al.
• The following two earlier papers were focused on sparse coding and NMF (they are listed below) It would seem to me that much more insight could be gotten out of the studies on FISTA et related algorithms for compressive sensing but I can imagine I have a bias on that matter.

Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures by John R. Hershey, Jonathan Le Roux, Felix Weninger

Model-based methods and deep neural networks have both been tremendously successful paradigms in machine learning. In model-based methods, problem domain knowledge can be built into the constraints of the model, typically at the expense of difficulties during inference. In contrast, deterministic deep neural networks are constructed in such a way that inference is straightforward, but their architectures are generic and it is unclear how to incorporate knowledge. This work aims to obtain the advantages of both approaches. To do so, we start with a model-based approach and an associated inference algorithm, and \emph{unfold} the inference iterations as layers in a deep network. Rather than optimizing the original model, we \emph{untie} the model parameters across layers, in order to create a more powerful network. The resulting architecture can be trained discriminatively to perform accurate inference within a fixed network size. We show how this framework allows us to interpret conventional networks as mean-field inference in Markov random fields, and to obtain new architectures by instead using belief propagation as the inference algorithm. We then show its application to a non-negative matrix factorization model that incorporates the problem-domain knowledge that sound sources are additive. Deep unfolding of this model yields a new kind of non-negative deep neural network, that can be trained using a multiplicative backpropagation-style update algorithm. We present speech enhancement experiments showing that our approach is competitive with conventional neural networks despite using far fewer parameters.

Learning Fast Approximations of Sparse Coding by Karol Gregor , Yann Lecun

In Sparse Coding (SC), input vectors are reconstructed using a sparse linear combination of basis vectors. SC has become a popular method for extracting features from data. For a given input, SC minimizes a quadratic reconstruction error with an L1 penalty term on the code. The process is often too slow for applications such as real-time pattern recognition. We proposed two versions of a very fast algorithm that produces approximate estimates of the sparse code that can be used to compute good visual features, or to initialize exact iterative algorithms. The main idea is to train a non-linear, feed-forward predictor with a specific architecture and a fixed depth to produce the best possible approximation of the sparse code. A version of the method, which can be seen as a trainable version of Li and Osher’s coordinate descent method, is shown to produce approximate solutions with 10 times less computation than Li and Osher’s for the same approximation error. Unlike previous proposals for sparse code predictors, the system allows a kind of approximate “explaining away ” to take place during inference. The resulting predictor is differentiable and can be included into globallytrained recognition systems. 1.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, May 18, 2016

### Performance Trade-Offs in Multi-Processor Approximate Message Passing

Nice ! What if you want to use a fast solver like AMP on a distributed architecture or a sensor network topology, the following paper investigate the trade-offs associated with that.

Performance Trade-Offs in Multi-Processor Approximate Message Passing by Junan Zhu, Ahmad Beirami, Dror Baron

We consider large-scale linear inverse problems in Bayesian settings. Our general approach follows a recent line of work that applies the approximate message passing (AMP) framework in multi-processor (MP) computational systems by storing and processing a subset of rows of the measurement matrix along with corresponding measurements at each MP node. In each MP-AMP iteration, nodes of the MP system and its fusion center exchange lossily compressed messages pertaining to their estimates of the input. There is a trade-off between the physical costs of the reconstruction process including computation time, communication loads, and the reconstruction quality, and it is impossible to simultaneously minimize all the costs. We pose this minimization as a multi-objective optimization problem (MOP), and study the properties of the best trade-offs (Pareto optimality) in this MOP. We prove that the achievable region of this MOP is convex, and conjecture how the combined cost of computation and communication scales with the desired mean squared error. These properties are verified numerically.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Performance Limits for Noisy Multi-Measurement Vector Problems

Here is some analysis (and attendant phase diagrams) for BP and AMP solvers in the MMV and SMV (CS) cases. As opposed to previous work featuring phase diagrams, this analysis goes into the noisy setting. It's been added to the Advanced Matrix Factorization Jungle page.

Performance Limits for Noisy Multi-Measurement Vector Problems by Junan Zhu, Dror Baron, Florent Krzakala

Compressed sensing (CS) demonstrates that sparse signals can be estimated from under-determined linear systems. Distributed CS (DCS) is an area of CS that further reduces the number of measurements by considering joint sparsity within signal ensembles. DCS with jointly sparse signals has applications in multi-sensor acoustic sensing, magnetic resonance imaging with multiple coils, remote sensing, and array signal processing. Multi-measurement vector (MMV) problems consider the estimation of jointly sparse signals under the DCS framework. Two related MMV settings are studied. In the first setting, each signal vector is measured by a different independent and identically distributed (i.i.d.) measurement matrix, while in the second setting, all signal vectors are measured by the same i.i.d. matrix. Replica analysis is performed for these two MMV settings, and the minimum mean squared error (MMSE), which turns out to be identical for both settings, is obtained as a function of the noise variance and number of measurements. Multiple performance regions for MMV are identified where the MMSE behaves differently as a function of the noise variance and the number of measurements. Belief propagation (BP) is a CS signal estimation framework that often achieves the MMSE asymptotically. A phase transition for BP is identified. This phase transition, verified by numerical results, separates the regions where BP achieves the MMSE and where it is suboptimal. Numerical results also illustrate that more signal vectors in the jointly sparse signal ensemble lead to a better phase transition. To showcase the application of MMV models, the MMSE of complex CS problems with both real and complex measurement matrices is analyzed.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.