Page Views on Nuit Blanche since July 2010







Please join/comment on the Google+ Community (1530), the CompressiveSensing subreddit (849), the Facebook page (46 likes), the LinkedIn Compressive Sensing group (3301) or the Advanced Matrix Factorization Group (1048)

Monday, July 06, 2015

Compressing Neural Networks with the Hashing Trick - implementation -

I love this line of work, using the hashing trick to reduce the number of coefficients in current deep neural network architectures. As the paper says the coefficients in current networks seem to show some redundancy. In fact, the hashing trick looks like some sort of l_p regularization on the coefficients.  More on that later...
 
 

Compressing Neural Networks with the Hashing Trick by Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen

As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. These parameters are tuned to adjust to the HashedNets weight sharing architecture with standard backprop during training. Our hashing procedure introduces no additional memory overhead, and we demonstrate on several benchmark data sets that HashedNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance.
 The paper says the code is here but it is not there yet.
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Near-Optimal Estimation of Simultaneously Sparse and Low-Rank Matrices from Nested Linear Measurements

 

Back a while ago, it was mentioned that the use of Multiple regularizers may not yield a good enough way to reduce sampling bounds while at the same time enforcing different prior information on the structure of the signal - see the video Convex Relaxations for Recovering Simultaneously Structured Objects and attendant preprint - . During COLT, Maryam pointed me to another approach featured in the following new preprint. In their approach, they had a non convex regularizer, here the approach is to use  nested measurement ensembles.
 
Near-Optimal Estimation of Simultaneously Sparse and Low-Rank Matrices from Nested Linear Measurements by Sohail Bahmani, Justin Romberg

In this paper we consider the problem of estimating simultaneously low-rank and row-wise sparse matrices from nested linear measurements where the linear operator consists of the product of a linear operator W and a matrix Ψ. Leveraging the nested structure of the measurement operator, we propose a computationally efficient two-stage algorithm for estimating the simultaneously structured target matrix. Assuming that W is a restricted isometry for low-rank matrices and Ψ is a restricted isometry for row-wise sparse matrices, we establish an accuracy guarantee that holds uniformly for all sufficiently low-rank and row-wise sparse matrices with high probability. Furthermore, using standard tools from information theory, we establish a minimax lower bound for estimation of simultaneously low-rank and row-wise sparse matrices from linear measurements that need not be nested. The accuracy bounds established for the algorithm, that also serve as a minimax upper bound, differ from the derived minimax lower bound merely by a polylogarithmic factor of the dimensions. Therefore, the proposed algorithm is nearly minimax optimal. We also discuss some applications of the proposed observation model and evaluate our algorithm through numerical simulation.
 
Thanks Maryam for the heads-up ! 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, July 03, 2015

Proceedings of The 28th Conference on Learning Theory - COLT 2015

So the COLT conference started this morning in sweltering Paris. Many of the presentations have been featured in a fashion or another on Nuit Blanche. Here are the full proceedings

 

Regular Papers

  • On Consistent Surrogate Risk Minimization and Property Elicitation, Arpit Agarwal, Shivani Agarwal, [abs] [pdf]
  • Online Learning with Feedback Graphs: Beyond Bandits, Noga Alon, Nicolò Cesa-Bianchi, Ofer Dekel, Tomer Koren [abs] [pdf]
  • Learning Overcomplete Latent Variable Models through Tensor Methods, Animashree Anandkumar, Rong Ge, Majid Janzamin [abs] [pdf]
  • Simple, Efficient, and Neural Algorithms for Sparse Coding, Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra [abs] [pdf]
  • Label optimal regret bounds for online local learning, Pranjal Awasthi, Moses Charikar, Kevin A Lai, Andrej Risteski [abs] [pdf]
  • Efficient Learning of Linear Separators under Bounded Noise, Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, Ruth Urner, [abs] [pdf]
  • Efficient Representations for Lifelong Learning and Autoencoding, Maria-Florina Balcan, Avrim Blum, Santosh Vempala [abs] [pdf]
  • Optimally Combining Classifiers Using Unlabeled Data, Akshay Balsubramani, Yoav Freund [abs] [pdf]
  • Minimax Fixed-Design Linear Regression, Peter L. Bartlett, Wouter M. Koolen, Alan Malek, Eiji Takimoto, Manfred K. Warmuth [abs] [pdf]
  • Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions, Alexandre Belloni, Tengyuan Liang, Hariharan Narayanan, Alexander Rakhlin [abs] [pdf]
  • Bandit Convex Optimization: T Regret in One Dimension, Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres [abs] [pdf]
  • The entropic barrier: a simple and optimal universal self-concordant barrier, Sébastien Bubeck, Ronen Eldan [abs] [pdf]
  • Optimum Statistical Estimation with Strategic Data Sources, Yang Cai, Constantinos Daskalakis, Christos Papadimitriou [abs] [pdf]
  • On the Complexity of Learning with Kernels, Nicolò Cesa-Bianchi, Yishay Mansour, Ohad Shamir [abs] [pdf]
  • Learnability of Solutions to Conjunctive Queries: The Full Dichotomy, Hubie Chen, Matthew Valeriote [abs] [pdf]
  • Sequential Information Maximization: When is Greedy Near-optimal? Yuxin Chen, S. Hamed, Hassani, Amin Karbasi, Andreas Krause [abs] [pdf]
  • Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification, Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, Shang-Hua Teng [abs] [pdf]
  • Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery, Peter Chin, Anup Rao, Van Vu [abs] [pdf]
  • On-Line Learning Algorithms for Path Experts with Non-Additive Losses, Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Manfred Warmuth [abs] [pdf]
  • Truthful Linear Regression, Rachel Cummings, Stratis Ioannidis, Katrina Ligett, [abs] [pdf]
  • A PTAS for Agnostically Learning Halfspaces, Amit Daniely [abs] [pdf]
  • S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification, Gautam Dasarathy, Robert Nowak, Xiaojin Zhu [abs] [pdf]
  • Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems, Yash Deshpande, Andrea Montanari [abs] [pdf]
  • Contextual Dueling Bandits, Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi [abs] [pdf]
  • Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering, Justin Eldridge, Mikhail Belkin, Yusu Wang [abs] [pdf]
  • Faster Algorithms for Testing under Conditional Sampling, Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh [abs] [pdf]
  • Learning and inference in the presence of corrupted inputs, Uriel Feige, Yishay Mansour, Robert Schapire [abs] [pdf]
  • From Averaging to Acceleration, There is Only a Step-size, Nicolas Flammarion, Francis Bach [abs] [pdf]
  • Variable Selection is Hard, Dean Foster, Howard Karloff, Justin Thaler [abs] [pdf]
  • Vector-Valued Property Elicitation, Rafael Frongillo, Ian A. Kash [abs] [pdf]
  • Competing with the Empirical Risk Minimizer in a Single Pass, Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford [abs] [pdf]
  • A Chaining Algorithm for Online Nonparametric Regression, Pierre Gaillard, Sébastien Gerchinovitz [abs] [pdf]
  • Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition, Rong Ge, Furong Huang, Chi Jin, Yang Yuan [abs] [pdf]
  • Learning the dependence structure of rare events: a non-asymptotic study, Nicolas Goix, Anne Sabourin, Stéphan Clémençon [abs] [pdf]
  • Thompson Sampling for Learning Parameterized Markov Decision Processes, Aditya Gopalan, Shie Mannor [abs] [pdf]
  • Computational Lower Bounds for Community Detection on Random Graphs, Bruce Hajek, Yihong Wu, Jiaming Xu [abs] [pdf]
  • Adaptive Recovery of Signals by Convex Optimization, Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski, Dmitry Ostrovsky [abs] [pdf]
  • Tensor principal component analysis via sum-of-square proofs, Samuel B. Hopkins, Jonathan Shi, David Steurer [abs] [pdf]
  • Fast Exact Matrix Completion with Finite Samples, Prateek Jain, Praneeth Netrapalli [abs] [pdf]
  • Exp-Concavity of Proper Composite Losses, Parameswaran Kamalaruban, Robert Williamson, Xinhua Zhang [abs] [pdf]
  • On Learning Distributions from their Samples, Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, Ananda Theertha Suresh [abs] [pdf]
  • MCMC Learning, Varun Kanade, Elchanan Mossel, [abs] [pdf]
  • Online with Spectral Bounds, Zohar Karnin, Edo Liberty, [abs] [pdf]
  • Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem, Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa [abs] [pdf]
  • Second-order Quantile Methods for Experts and Combinatorial Games, Wouter M. Koolen, Tim Van Erven [abs] [pdf]
  • Hierarchical Label Queries with Data-Dependent Partitions, Samory Kpotufe, Ruth Urner, Shai Ben-David [abs] [pdf]
  • Algorithms for Lipschitz Learning on Graphs, Rasmus Kyng, Anup Rao, Sushant Sachdeva, Daniel A. Spielman [abs] [pdf]
  • Low Rank Matrix Completion with Exponential Family Noise, Jean Lafond [abs] [pdf]
  • Bad Universal Priors and Notions of Optimality, Jan Leike, Marcus Hutter [abs] [pdf]
  • Learning with Square Loss: Localization through Offset Rademacher Complexity, Tengyuan Liang, Alexander Rakhlin, Karthik Sridharan [abs] [pdf]
  • Achieving All with No Parameters: AdaNormalHedge, Haipeng Luo, Robert E. Schapire [abs] [pdf]
  • Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization, Mehrdad Mahdavi, Lijun Zhang, Rong Jin [abs] [pdf]
  • Correlation Clustering with Noisy Partial Information Konstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan [abs] [pdf]
  • Online Density Estimation of Bradley-Terry Models, Issei Matsumoto, Kohei Hatano, Eiji Takimoto [abs] [pdf]
  • First-order regret bounds for combinatorial semi-bandits, Gergely Neu [abs] [pdf]
  • Norm-Based Capacity Control in Neural Networks, Behnam Neyshabur, Ryota Tomioka, Nathan Srebro [abs] [pdf]
  • Cortical Learning via Prediction, Christos H. Papadimitriou, Santosh S. Vempala [abs] [pdf]
  • Partitioning Well-Clustered Graphs: Spectral Clustering Works!, Richard Peng, He Sun, Luca Zanetti [abs] [pdf]
  • Batched Bandit Problems, Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg [abs] [pdf]
  • Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints, Alexander Rakhlin, Karthik Sridharan [abs] [pdf]
  • Fast Mixing for Discrete Point Processes, Patrick Rebeschini, Amin Karbasi [abs] [pdf]
  • Generalized Mixability via Entropic Duality, Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson, Nishant Mehta [abs] [pdf]
  • On the Complexity of Bandit Linear Optimization, Ohad Shamir [abs] [pdf]
  • An Almost Optimal PAC Algorithm, Hans U. Simon [abs] [pdf]
  • Minimax rates for memory-bounded sparse linear regression, Jacob Steinhardt, John Duchi [abs] [pdf]
  • Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery, Thomas Steinke, Jonathan Ullman [abs] [pdf]
  • Convex Risk Minimization and Conditional Probability Estimation, Matus Telgarsky, Miroslav Dudík, Robert Schapire [abs] [pdf]
  • Regularized Linear Regression: A Precise Analysis of the Estimation Error, Christos Thrampoulidis, Samet Oymak, Babak Hassibi, [abs] [pdf]
  • Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity, Santosh S. Vempala, Ying. Xiao, [abs] [pdf]
  • On Convergence of Emphatic Temporal-Difference Learning, H. Yu, [abs] [pdf]

Open Problems

  • Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs, Arindam Banerjee, Sheng Chen, Vidyashankar Sivakumar, [abs] [pdf]
  • Open Problem: The landscape of the loss surfaces of multilayer networks, Anna, Choromanska, Yann, LeCun, Gérard Ben Arous [abs] [pdf]
  • Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings, Cristóbal Guzmán [abs] [pdf]
  • Open Problem: Online Sabotaged Shortest Path, Wouter M. Koolen, Manfred K. Warmuth, Dmitri Adamskiy, [abs] [pdf]
  • Open Problem: Learning Quantum Circuits with Queries, Jeremy Kun, Lev Reyzin, [abs] [pdf]
  • Open Problem: Recursive Teaching Dimension Versus VC Dimension, Hans U. Simon, Sandra Zilles, [abs] [pdf]
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Compressed Sensing of Multi-Channel EEG Signals: The Simultaneous Cosparsity and Low Rank Optimization

 

Compressed Sensing of Multi-Channel EEG Signals: The Simultaneous Cosparsity and Low Rank Optimization by Yipeng Liu, Maarten De Vos, Sabine Van Huffel

Goal: This paper deals with the problems that some EEG signals have no good sparse representation and single channel processing is not computationally efficient in compressed sensing of multi-channel EEG signals. Methods: An optimization model with L0 norm and Schatten-0 norm is proposed to enforce cosparsity and low rank structures in the reconstructed multi-channel EEG signals. Both convex relaxation and global consensus optimization with alternating direction method of multipliers are used to compute the optimization model. Results: The performance of multi-channel EEG signal reconstruction is improved in term of both accuracy and computational complexity. Conclusion: The proposed method is a better candidate than previous sparse signal recovery methods for compressed sensing of EEG signals. Significance: The proposed method enables successful compressed sensing of EEG signals even when the signals have no good sparse representation. Using compressed sensing would much reduce the power consumption of wireless EEG system.
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, July 02, 2015

The Loss Surfaces of Multilayer Networks, Global Optimality in Tensor Factorization, Deep Learning, and Beyond , Generalization Bounds for Neural Networks through Tensor Factorization

Yesterday, Yann was talking at the Challenge in Optimization for Data Science workshop. One of his slides featured the distribution of losses across different neural network sizes (each color representing a specific network size, for one network size several randomly chosen networks were trained). The figure above was extracted from [1] but it is was the one shown in the slide.  The crux of the argument was that by becoming larger, neural networks, trained from initially random weights, could see a concentration of their final losses i.e. once the networks are becoming large enough, the non-convexity is not that bad because even the local minima are in fact not so far from the global minimum. Roughly speaking, enlarging or lifting the phase space of  hyperparameters allows the loss to converge toward zero while at the same time these losses are becoming more and more concentrated. Understanding how we can train these networks is important because currently we have no idea on how to really streamline the process nor have a particular understanding of the sampling bounds. It turns out that I had a very similar discussion with the folks at Kalray who were presenting their architectures to the NeuroSTIC people who also happen to have their annual meetings just a few meters away.

Coming back to this lifting argument, here are two papers that mentions that paper  ([1] The Loss Surfaces of Multilayer Networks by Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun ) and both take the path of tensorization as a way to do the lifting: First there is [3] Generalization Bounds for Neural Networks through Tensor Factorization by Majid Janzamin, Hanie Sedghi, Anima Anandkumar and then there is [2] Global Optimality in Tensor Factorization, Deep Learning, and Beyond by Benjamin D. Haeffele, Rene Vidal.

 From [3] one can read:  
Recently, Choromanska et al. (2015) analyze the loss surface of a multi-layer ReLU network by relating it to a spin glass system. They make several assumptions such as variable independence for the input, redundancy in network parameterization and so on. They show that under these assumptions, the lowest critical values of the random loss function form a layered structure, and the number of local minima outside that band diminishes exponentially with the network size.
aside from the sampling bound on the training they give, their approach is embarrassingly parallel which has a big impact on the eventual hardware on which it is going to be run. From [2] one can read:
Additionally, we have also shown that if the size of the network is allowed to be large enough then for any initialization a global minimizer can be found from purely local descent, and thus local minima are all equivalent. This is a similar conclusion to the work of Choromanska et al. (2014), who analyzed the problem from a statistical standpoint and showed that with sufficiently large networks and appropriate assumptions about the distributions of the data and network weights, then with high probability any family of networks learned with different initializations will have similar objective values, but we note that our results allow for a well defined set of conditions which will be sufficient to guarantee the property. Finally, many modern large scale networks do not use traditional regularization on the network weigh parameters such as an l1 or l2 norms during training and instead rely on alternative forms of regularization such as dropout as it tends to achieve better performance in practice(Srivastava et al., 2014; Krizhevsky et al., 2012; Wan et al., 2013). Given our commentary above regarding the critical importance of balancing the degree of homogeneity between the mapping and the regularizer, an immediate prediction of our analysis is that simply ensuring that the degrees of homogeneity are balanced could be a significant factor in improving the performance of deep networks 
I can't wait to see how this discussion on mapping and regularizer will turn out. Without further ado, here are the abstracts:
 [1] The Loss Surfaces of Multilayer Networks
Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun

We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function form a layered structure and they are located in a well-defined band lower-bounded by the global minimum. The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band of low critical points, and that all critical points found there are local minima of high quality measured by the test error. This emphasizes a major difference between large- and small-size networks where for the latter poor quality local minima have non-zero probability of being recovered. Finally, we prove that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.
[2] Global Optimality in Tensor Factorization, Deep Learning, and Beyond
Benjamin D. Haeffele, Rene Vidal

Techniques involving factorization are found in a wide range of applications and have enjoyed significant empirical success in many fields. However, common to a vast majority of these problems is the significant disadvantage that the associated optimization problems are typically non-convex due to a multilinear form or other convexity destroying transformation. Here we build on ideas from convex relaxations of matrix factorizations and present a very general framework which allows for the analysis of a wide range of non-convex factorization problems - including matrix factorization, tensor factorization, and deep neural network training formulations. We derive sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Our framework also provides a partial theoretical justification for the increasingly common use of Rectified Linear Units (ReLUs) in deep neural networks and offers guidance on deep network architectures and regularization strategies to facilitate efficient optimization.
[3] Generalization Bounds for Neural Networks through Tensor Factorization
Majid Janzamin, Hanie Sedghi, Anima Anandkumar

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for training a two-layer neural network. We prove efficient generalization bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for generalizability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, for the first time, we have a computationally efficient method with guaranteed generalization bounds for training neural networks.
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, July 01, 2015

Nuit Blanche in Review ( June 2015 )

Today, I will probably be at the Challenge in Optimization for Data Science workshop.

Since the last Nuit Blanche in Review ( May 2015 ) we certainly got closer to Pluto and Charon. We covered a few subjects ranging from 3rd generation genome sequencing (1), to hardware for machine learning (2) to a long series of post on random projections (3-7) and more:

  1. Compressive phase transition, a million genomes and 3rd generation sequencing 
  2.  Hardware for Machine Learning
  3.  Slides: Learning with Random Projections, Random Projections for Machine Learning and Data Mining: Theory & Applications   
  4. Random Projections as Regularizers, Compressive Linear Least Squares Regression and more
    The Unreasonable Effectiveness of Random Projections in Computer Science 
  5. Improved Bounds on the Dot Product under Random Projection and Random Sign Projection
    Random Features and random projections
  6. Random Maxout Features look like Quantized JL Embeddings 
  7. Extreme Compressive Sampling for Covariance Estimation 
from these, we shared two insights:

and quite a few implementations:
Some theses:

In-depth


Meetups:

Videos:


Conferences:


Videos:
Other:



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, June 30, 2015

Vowpal Wabbit: A Machine Learning System ( Paris Machine Learning Meetup "Hors Série" #4 Season 2 )


Following up on meetup #6 Season 2, John Langford just gave us a tutorial presentation on Vowpal Wabbit this morning in Paris (Thank you Christophe and AXA for hosting us). Here are the slides:
Here is some additional relevant background material.
Much like the example Leon Bottou gave us on counterfactual reasoning, ( see his slides: Learning to Interact ) a year ago. I very much liked the exploration bit for policies evaluation: if you don't explore you just don't know and prediction errors are not controlled exploration.


which will be the subject of John's presentation at ICML in Lille next week:

Learning to Search Better than Your Teacher by Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daume, John Langford


 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, June 29, 2015

Slides and Implementation: Random forests with random projections of the output space for high dimensional multi-label classification

 
 

The implementation of the random output tree is here at: https://github.com/arjoly/random-output-trees

 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thesis: Learning in High Dimensions with Projected Linear Discriminants by Robert Durrant

 
The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago.  However, experience in such modelling has made researchers aware of many issues associated with working in high-dimensional domains, collectively known as `the curse of dimensionality', which can confound practitioners' desires to build good models of the  world  from  these  data.   When  the  dimensionality  is  very  large,  low-dimensional methods and geometric intuition both break down in these high-dimensional spaces. To mitigate the dimensionality curse we can use low-dimensional representations of the original data that capture most of the information it contained.  However, little is currently known about the eff ect of such dimensionality reduction on classi fier performance.  In this thesis we develop theory quantifying the e ect of random projection { a recent, very promising, non-adaptive dimensionality reduction technique {on the classi cation performance of Fisher's Linear Discriminant (FLD), a successful and  widely-used  linear  classifier. We  tackle  the  issues  associated  with  small  sample size and high-dimensionality by using randomly projected FLD ensembles, and we develop theory explaining why our new approach performs well.  Finally, we quantify the generalization error of Kernel FLD, a related non-linear projected classifier.
 
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Printfriendly