## Wednesday, July 27, 2016

### gvnn: Neural Network Library for Geometric Computer Vision

I learned about Spatial transformers at a Deep Learning meetup in Paris from a presentation by Alban Desmaison (yes I sometimes do attend other awesome meetups). Spatial transformers are "aiming at boosting the geometric invariance of CNNs" Transformers are using transformations from computer vision and camera models to help deep neural networks learn better or faster. They do so by providing models behind generic invariances like rotation, translation etc.... It is a little bit unsettling since most defenders of deep neural architectures would rather add data than get help from vision  models :-) Anyway, let's see this as another instance of the Great Convergence where Machine Learning and Computer graphics, the old computer vision paradigm and signal processing meet. Today, we have a library written in Torch that describes several transformation for inclusion in neural networks.

Let me just add that while these transformers do help neural networks in reducing the number of network coefficients, the remaining coefficients are probably summarizing many of the geometric invariances that we collectively have not yet discovered -this is a sense I got from a presentation by Stephane Mallat a while back when he mentioned the connection between deep neural networks and his scattering networks-

gvnn: Neural Network Library for Geometric Computer Vision by Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, Andrew Davison

We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through warping with a parametric transformation for image reconstruction error.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Tuesday, July 26, 2016

### Dual Purpose Hashing

Recent years have seen more and more demand for a unified framework to address multiple realistic image retrieval tasks concerning both category and attributes. Considering the scale of modern datasets, hashing is favorable for its low complexity. However, most existing hashing methods are designed to preserve one single kind of similarity, thus improper for dealing with the different tasks simultaneously. To overcome this limitation, we propose a new hashing method, named Dual Purpose Hashing (DPH), which jointly preserves the category and attribute similarities by exploiting the Convolutional Neural Network (CNN) models to hierarchically capture the correlations between category and attributes. Since images with both category and attribute labels are scarce, our method is designed to take the abundant partially labelled images on the Internet as training inputs. With such a framework, the binary codes of new-coming images can be readily obtained by quantizing the network outputs of a binary-like layer, and the attributes can be recovered from the codes easily. Experiments on two large-scale datasets show that our dual purpose hash codes can achieve comparable or even better performance than those state-of-the-art methods specifically designed for each individual retrieval task, while being more compact than the compared methods.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, July 25, 2016

### Onsager-corrected deep learning for sparse linear inverse problems

Thomas let me know on Twitter that the Great Convergence continues, Today we find out how we go about changing the iterative process of AMP and then learn coefficients of that process as in Deep Learning. It looks like the Learned AMP beats LISTA. Looking back at the few COLT presentations earlier (Saturday videos), one wonders how these solvers change the rule of thumbs on model depth and size. To be continued....

Deep learning has gained great popularity due to its widespread success on many inference problems. We consider the application of deep learning to the sparse linear inverse problem encountered in compressive sensing, where one seeks to recover a sparse signal from a small number of noisy linear measurements. In this paper, we propose a novel neural-network architecture that decouples prediction errors across layers in the same way that the approximate message passing (AMP) algorithm decouples them across iterations: through Onsager correction. Numerical experiments suggest that our "learned AMP" network significantly improves upon Gregor and LeCun's "learned ISTA" network in both accuracy and complexity.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, July 23, 2016

### Saturday Morning Video: On the Expressive Power of Deep Learning: A Tensor Analysis, Nadav Cohen, Or Sharir, Amnon Shashua @ COLT2016

The preprint on which it relies is:

On the Expressive Power of Deep Learning: A Tensor Analysis by Nadav Cohen, Or Sharir, Amnon Shashua

It has long been conjectured that hypotheses spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical networks than with shallow ones. Despite the vast empirical evidence supporting this belief, theoretical justifications to date are limited. In particular, they do not account for the locality, sharing and pooling constructs of convolutional networks, the most successful deep learning architecture to date. In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling. An equivalence between the networks and hierarchical tensor factorizations is established. We show that a shallow network corresponds to CP (rank-1) decomposition, whereas a deep network corresponds to Hierarchical Tucker decomposition. Using tools from measure theory and matrix algebra, we prove that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network. Since log-space computation transforms our networks into SimNets, the result applies directly to a deep learning architecture demonstrating promising empirical performance. The construction and theory developed in this paper shed new light on various practices and ideas employed by the deep learning community.
other recent work by Nadav include:

Convolutional Rectifier Networks as Generalized Tensor Decompositions by Nadav Cohen, Amnon Shashua

Convolutional rectifier networks, i.e. convolutional neural networks with rectified linear activation and max or average pooling, are the cornerstone of modern deep learning. However, despite their wide use and success, our theoretical understanding of the expressive properties that drive these networks is partial at best. On the other hand, we have a much firmer grasp of these issues in the world of arithmetic circuits. Specifically, it is known that convolutional arithmetic circuits possess the property of "complete depth efficiency", meaning that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be implemented (or even approximated) by a shallow network. In this paper we describe a construction based on generalized tensor decompositions, that transforms convolutional arithmetic circuits into convolutional rectifier networks. We then use mathematical tools available from the world of arithmetic circuits to prove new results. First, we show that convolutional rectifier networks are universal with max pooling but not with average pooling. Second, and more importantly, we show that depth efficiency is weaker with convolutional rectifier networks than it is with convolutional arithmetic circuits. This leads us to believe that developing effective methods for training convolutional arithmetic circuits, thereby fulfilling their expressive potential, may give rise to a deep learning architecture that is provably superior to convolutional rectifier networks but has so far been overlooked by practitioners.

Inductive Bias of Deep Convolutional Networks through Pooling Geometry by Nadav Cohen, Amnon Shashua
Our formal understanding of the inductive bias that drives the success of convolutional networks on computer vision tasks is limited. In particular, it is unclear what makes hypotheses spaces born from convolution and pooling operations so suitable for natural images. In this paper we study the ability of convolutional arithmetic circuits to model correlations among regions of their input. Correlations are formalized through the notion of separation rank, which for a given input partition, measures how far a function is from being separable. We show that a polynomially sized deep network supports exponentially high separation ranks for certain input partitions, while being limited to polynomial separation ranks for others. The network's pooling geometry effectively determines which input partitions are favored, thus serves as a means for controlling the inductive bias. Contiguous pooling windows as commonly employed in practice favor interleaved partitions over coarse ones, orienting the inductive bias towards the statistics of natural images. In addition to analyzing deep networks, we show that shallow ones support only linear separation ranks, and by this gain insight into the benefit of functions brought forth by depth - they are able to efficiently model strong correlation under favored partitions of the input.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Saturday Morning Video: Benefits of depth in neural networks, Matus Telgarsky @ COLT2016

Representation Benefits of Deep Feedforward Networks by Matus Telgarsky
This note provides a family of classification problems, indexed by a positive integer k, where all shallow networks with fewer than exponentially (in k) many nodes exhibit error at least 1/6, whereas a deep network with 2 nodes in each of 2k layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated k times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Saturday Morning Video: The Power of Depth for Feedforward Neural Networks by Ohad Shamir @ COLT2016

The attendant paper is here:

The Power of Depth for Feedforward Neural Networks by Ronen Eldan, Ohad Shamir

We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. The result holds for virtually all known activation functions, including rectified linear units, sigmoids and thresholds, and formally demonstrates that depth -- even if increased by 1 -- can be exponentially more valuable than width for standard feedforward neural networks. Moreover, compared to related results in the context of Boolean functions, our result requires fewer assumptions, and the proof techniques and construction are very different.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Saturday Morning Video: Randomized Algorithms in Linear Algebra, Ravi Kannan @ COLT2016

As Sebastien pointed out the COLT 2016 videos are out. Here is one: Ravi Kannan on Randomized Algorithms in Linear Algebra

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, July 22, 2016

### Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks - implementation -

Ivan, the person behind the Tensor Train tensor decomposition just sent me the following:

Dear Igor,
I have a new interesting paper to share. " Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks"
http://arxiv.org/abs/1607.04228
(accepted at ACM RecSys 2016).
A framework is also available: https://github.com/Evfro/polaraThe key idea is to introduce a tensor from user-item-rating, thus being able to recommend even from a negative feedback.

With best wishes,
Ivan.

Thanks Ivan !

Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks by Evgeny Frolov, Ivan Oseledets
Conventional collaborative filtering techniques treat a top-n recommendations problem as a task of generating a list of the most relevant items. This formulation, however, disregards an opposite - avoiding recommendations with completely irrelevant items. Due to that bias, standard algorithms, as well as commonly used evaluation metrics, become insensitive to negative feedback. In order to resolve this problem we propose to treat user feedback as a categorical variable and model it with users and items in a ternary way. We employ a third-order tensor factorization technique and implement a higher order folding-in method to support online recommendations. The method is equally sensitive to entire spectrum of user ratings and is able to accurately predict relevant items even from a negative only feedback. Our method may partially eliminate the need for complicated rating elicitation process as it provides means for personalized recommendations from the very beginning of an interaction with a recommender system. We also propose a modification of standard metrics which helps to reveal unwanted biases and account for sensitivity to a negative feedback. Our model achieves state-of-the-art quality in standard recommendation tasks while significantly outperforming other methods in the cold-start "no-positive-feedback" scenarios.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, July 20, 2016

### Random projections of random manifolds

Looks like some bounds have been tightened. Funny enough, in a matter of a week, I am either talking to or featuring someone from SpaceX, which just happens to deliver today or tomorrow a nanopore sequencer to the international space station

Small worlds!

Anyway, getting back to the paper.

Random projections of random manifolds by Subhaneil Lahiri, Peiran Gao, Surya Ganguli
A ubiquitous phenomenon is that interesting signals or data concentrate on low dimensional smooth manifolds inside a high dimensional ambient Euclidean space. Random projections are a simple and powerful tool for dimensionality reduction of such signals and data. Previous, seminal works have studied bounds on how the number of projections needed to preserve the geometry of these manifolds, at a given accuracy, scales with the intrinsic dimensionality, volume and curvature of the manifold. However, such works employ definitions of volume and curvature that are inherently difficult to compute. Therefore such theory cannot be easily tested against numerical simulations to quantitatively understand the tightness of the proven bounds. We instead study the typical distortions arising in random projections of an ensemble of smooth Gaussian random manifolds. In doing so, we find explicitly computable, approximate theoretical bounds on the number of projections required to preserve the geometric structure of these manifolds to a prescribed level of accuracy. Our bounds, while approximate, can only be violated with a probability that is exponentially small in the ambient dimension, and therefore they hold with high probability in most cases of practical interest. Moreover, unlike previous work, we test our theoretical bounds against numerical experiments on the actual geometric distortions that typically occur for random projections of random smooth manifolds. Through this comparison, we find our bounds are tighter than previous results by several orders of magnitude.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Tuesday, July 19, 2016

### Our proposal for a NIPS workshop on "Mapping Machine Learning to Hardware" has been submitted !

You probably have noticed recently an acceleration of the offering when it comes to computing hardware in Machine Learning. You may have also noticed that some algorithms seem to take into account the constraints of being in the binary world or that new and innovative technologies were trying to get in that area without knowing for sure what primitives they were improving. You might even have worndered what sort of technology will bring us closer to better efficiencies while still being flexible in terms of algorithm design. Eventually, as algorithm designer, you mmight have wondered about the sort of constraints your algorithm should be fulfilling. All these questions and many others are the reasons for this recent announcement on Nuit Blanche ( A proposal for a NIPS Workshop or Symposium on "Mapping Machine Learning to Hardware").With a truly outstanding set of people, we have been able to get the commitment from the following people as plenary speakers so that they can enlighten us about what is going on when it comes to the mapping for Machine Learning to hardware and more importantly what is the State of the Possible:

Here is the final proposal we sent out last night to the gods of NIPS.

With the recent success of Deep Learning and related techniques, we are beginning to see new specialized hardware or extensions to existing architectures dedicated for making training and inference computations faster or energy efficient or both. These technologies use either traditional CMOS technology on conventional von-Neumann architectures such as CPUs or accelerators such as DSPs, GPUs, FPGAs, and ASICs or other novel and exotic technologies in research phase such as  Memristors, Quantum Chips,  and Optical/Photonics. The overarching goal being to address a specific trade-off in mapping machine learning algorithms in general and deep learning in particular, to a specific underlying hardware technology.

Conversely, there has been quite an effort on the empirical side at devising deep network architectures for efficient implementation on low complexity hardware via low-rank tensor factorizations, structured matrix approximations, lower bit-depth like binary coefficients, compression and pruning to name a few approaches. This also has implications on leveraging appropriate hardware technology for inferencing primarily with energy and latency as the primary design goals.

These efforts are finding some traction in the signal processing and sparse/compressive sensing community to map the first layers of sensing hardware with the first layers of models such as deep networks. This approach has the potential of changing the way  sensing hardware, image reconstruction, signal processing and image understanding will be performed in the future.

This workshop aims to tie these seemingly disparate themes of co-design, architecture, algorithms and signal processing and bring together researchers at the interface of machine learning, hardware implementation, sensing, physics and biology for discussing the state of the art and the state of the possible.

The goals of the workshop are
• To present how machine learning computations and algorithms are mapped and co-designed with new hardware technologies and architectures
• To stimulate discussions on how machine learning algorithms can be co-designed or co-optimized with the underlying hardware technology to best take advantage of the synergy
• To evaluate the different trade-offs in accuracy, computational complexity, hardware cost, energy efficiency and application throughput currently investigated in these approaches
• To understand how data acquisition frameworks may be transformed to better interface with machine Learning algorithms
• To evaluate the constraints put forth on recent deep learning architectures so as to reduce redundancy and enable a simpler mapping between computing hardware and models.
• To enable a wider discussion on how Machine Learning algorithms and their primitives may provide a path toward exotic technology insertion in industrial roadmaps.

Besides the presentations made by the plenary speakers, there will be a poster session, lightning talks selected from the posters and a roundtable at the conclusion of the workshop. The workshop will be taped and presentations slides will be made available online. A white paper describing the talks and the discussions that went on during the workshop will be made available after the meeting.
Thanks to the following co-organizers, this proposal became a reality:
and a few other friends who transmitted the call inside their organizations.   Thank you !
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.