
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Fast linear transforms are ubiquitous in machine learning, including the...
read it

Learning Compressed Transforms with Low Displacement Rank
The low displacement rank (LDR) framework for structured matrices repres...
read it

Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice
Fast Fourier transform, Wavelets, and other wellknown transforms in sig...
read it

Handcrafted Attention is All You Need? A Study of Attention on Selfsupervised Audio Transformer
In this paper, we seek to reduce the computation complexity of transform...
read it

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices
Structured matrices, such as those derived from Kronecker products (KP),...
read it

LambdaNetworks: Modeling LongRange Interactions Without Attention
We present lambda layers – an alternative framework to selfattention – ...
read it

Structured Transforms for SmallFootprint Deep Learning
We consider the task of building compact deep learning pipelines suitabl...
read it
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Modern neural network architectures use structured linear transformations, such as lowrank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off speed, space, and accuracy. We consider a different approach: we introduce a family of matrices called kaleidoscope matrices (Kmatrices) that provably capture any structured matrix with nearoptimal space (parameter) and time (arithmetic operation) complexity. We empirically validate that Kmatrices can be automatically learned within endtoend pipelines to replace handcrafted procedures, in order to improve model quality. For example, replacing channel shuffles in ShuffleNet improves classification accuracy on ImageNet by up to 5 handengineered pipelines – we replace filter bank feature computation in speech data preprocessing with a learnable kaleidoscope layer, resulting in only 0.4 Kmatrices can capture latent structure in models: for a challenging permuted image classification task, a Kmatrix based representation of permutations is able to learn the right latent structure and improves accuracy of a downstream convolutional model by over 9 implementation of our approach, and use Kmatrices in a Transformer network to attain 36
READ FULL TEXT
Comments
There are no comments yet.