I learned about Spatial transformers at a Deep Learning meetup in Paris from a presentation by Alban Desmaison (yes I sometimes do attend other awesome meetups). Spatial transformers are "aiming at boosting the geometric invariance of CNNs" Transformers are using transformations from computer vision and camera models to help deep neural networks learn better or faster. They do so by providing models behind generic invariances like rotation, translation etc.... It is a little bit unsettling since most defenders of deep neural architectures would rather add data than get help from vision models :-) Anyway, let's see this as another instance of the Great Convergence where Machine Learning and Computer graphics, the old computer vision paradigm and signal processing meet. Today, we have a library written in Torch that describes several transformation for inclusion in neural networks.
Let me just add that while these transformers do help neural networks in reducing the number of network coefficients, the remaining coefficients are probably summarizing many of the geometric invariances that we collectively have not yet discovered -this is a sense I got from a presentation by Stephane Mallat a while back when he mentioned the connection between deep neural networks and his scattering networks-
gvnn: Neural Network Library for Geometric Computer Vision by Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, Andrew Davison
We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through warping with a parametric transformation for image reconstruction error.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.