MIT Stata Center

Invited Workshop on Compiler Techniques for Sparse Tensor Algebra

Cambridge MA, January 26th

Organizers: Fredrik Kjolstad and Saman Amarasinghe

There has been a lot of recent interest and innovation in compiler technology for high performance sparse tensor computation. This workshop will bring together relevant parties from academia and industry to discuss individual approaches, how they relate, and whether the ideas can be combined.

Program

There will be seven session of three speakers. The sessions will consist of three ten minute talks followed by a twenty minute speaker and room discussion.
Time Session
8:00 - 8:20 Registration
8:20 - 8:30 Welcome
8:30 - 9:20 Sparse Algorithms and Data Structures I
Chaired by Saman Amarasinghe (MIT)
Jiajia Li (Pacific Northwest National Laboratory)
Shaden Smith (Intel Labs)
Edgar Solomonik (University of Illinois at Urbana-Champaign)
9:20 - 9:30 Break
9:30 - 10:20 Sparse Algorithms and Data Structures II
Chaired by Maryam Dehnavi (University of Toronto)
Muthu Baskaran (Reservoir Labs)
Jee Choi (University of Oregon)
Duane Merrill and Michael Garland (NVIDIA Research)
10:20 - 10:30 Break
10:30 - 11:20 Sparse Polyhedral Optimization
Chaired by Shoaib Kamil (Adobe Research)
Mary Hall (University of Utah)
Cathie Olschanowsky (Boise State University)
Michelle Strout (University of Arizona)
11:20 - 11:30 Break
11:30 - 12:30 Tensor Compilers I
Chaired by Stephen Chou (MIT)
Dougal Maclaurin (Google)
P. (Saday) Sadayappan (Ohio State University)
Hongbo Rong (Intel Labs)
Keno Fischer (Julia Computing)
12:30 - 1:30 Lunch
1:30 - 2:20 Tensor Compilers II
Chaired by Chris Fletcher (University of Illinois at Urbana-Champaign)
Maryam Dehnavi (University of Toronto)
Fredrik Kjolstad (MIT)
Stephen Chou (MIT)
2:20 - 2:35 Break
2:35 - 3:25 Tensors and Graphs
Chaired by Fredrik Kjolstad (MIT)
Justin Solomon (MIT)
Jeremy Kepner (Lincoln Laboratory)
Julian Shun (MIT)
3:25 - 3:40 Break
3:40 - 4:30 Hardware for Sparse Tensor Algebra
Chaired by Joel Emer (NVIDIA Research and MIT)
Michael Pellauer (NVIDIA Research)
Hanrui Wang (MIT)
Chris Fletcher (University of Illinois at Urbana-Champaign)
4:30 - 4:45 Break
4:45 - 5:35 Applications in Science and Engineering
Chaired by Peter Ahrens (MIT)
Srinivas Eswar (Georgia Tech)
Richard Mills (Argonne National Laboratory)
Miles Stoudenmire (Flatiron Institute)
5:35 - 6:00 Discussion
6:00 - 8:00 Dinner and Poster Session

Papers

The following is a list of papers that were suggested by workshop participants.
Run-time parallelization in a polyhedral framework
Automating wavefront parallelization for sparse matrix computations Anand Venkat, Mahdi Soltan Mohammadi, Jongsoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, and Mary Hall SC 2016 https://ieeexplore.ieee.org/document/7877119/
Uninterpreted functions in a polyhedral framework
Non-affine Extensions to Polyhedral Code Generation Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Mills Strout CGO 2014 https://doi.org/10.1145/2581122.2544141
Data Transformations in a Polyhedral Framework
Loop and data transformations for sparse matrix code Anand Venkat, Mary Hall, and Michelle Strout PLDI 2015 https://doi.org/10.1145/2737924.2738003
NNCP Algo
Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. Signal Processing 2013
NNCP Algo
Nesterov-based parallel algorithm for large-scale nonnegative tensor factorization. Athanasios P Liavas, Georgios Kostoulas, Georgios Lourakis, Kejun Huang, and Nicholas D Sidiropoulos. ICASSP 2017
Storage formats for tensor operations: GPUs
A Unified Optimization Approach for Sparse Tensor Operations on GPUs B. Liu, C. Wen, A. Sarwate, and M. Mehri Dehnavi Cluster
Sparse tensor optimizations for tensor decompisitons used for sparse count data
Memory-efficient parallel tensor decompositions Baskaran, M., Henretty, T., Pradelle, B., Langston, M. H., Bruns-Smith, D., Ezick, J., Lethin, R HPEC 2017 https://www.reservoir.com/publication/memory-efficient-parallel-tensor-decompositions/)
Mode-specific and mode-generic tensor formats in ENSIGN
Efficient and scalable computations with sparse tensors Baskaran, M., Meister, B., Vasilache, N., & Lethin, R HPEC 2012 https://www.reservoir.com/publication/efficient-scalable-computations-sparse-tensors/)
FROSTT
The Formidable Repository of Open Sparse Tensors and Tools Bunch of people http://frostt.io/
Merge-based parallelism for load balancing
Merge-based Parallel Sparse Matrix-Vector Multiplication Duane Merill and Michael Garland SC 2016 https://ieeexplore.ieee.org/document/7877136
Cyclops tensor framework (distributed memroy tensor algebra)
A massively parallel tensor contraction framework for coupled-cluster computations Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel JPDC 2014 http://www.sciencedirect.com/science/article/pii/S074373151400104X
Sparse tensor contractions (matrix multiplication) and graph algorithms with Cyclops
Scaling betweenness centrality using communication-efficient sparse matrix multiplication Edgar Solomonik, Maciej Besta, Flavio Vella, and Torsten Hoefler SC 2017 https://arxiv.org/abs/1609.07008
TACO system
The Tensor Algebra Compiler Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe OOPSLA 2017 http://fredrikbk.com/oopsla17.html
Communication Lower bounds for MTTKRP
Communication lower bounds for matricized tensor times khatri-rao product. Grey Ballard, Nicholas Knight, and Kathryn Rouse. IPDPS 2018 https://arxiv.org/pdf/1708.07401.pdf
Specify sparse computations for FPGAs and CGRAs
Expressing Sparse Matrix Computations for Productive Performance on Spatial Architectures Hongbo Rong https://arxiv.org/abs/1810.07517
HBL upper bounds
Parallelepipeds obtaining HBL lower bounds. James Demmel and Alex Rusciano CoRR 2016 https://arxiv.org/pdf/1611.05944.pdf
Cache blocking for Sparse Tensors
Blocking Optimization Strategies for Sparse Tensor Computation. Jee Choi, Xing Liu, Shaden Smith, and Tyler Simon IPDPS 2018 https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxqZWV3aGFuY2hvaXBlcnNvbmFsfGd4OjZkNDBlMmU0NGYxZWQzYzI
HiCOO format
HiCOO: Hierarchical Storage of Sparse Tensors Jiajia Li, Jimeng Sun, Richard Vuduc SC 2018 http://fruitfly1026.github.io/static/files/sc18-li.pdf
PASTA benchmark suite
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite Jiajia Li, Yuchen Ma, Xiaolong Wu, Ang Li, Kevin Barker Technical Report 2019 http://fruitfly1026.github.io/static/files/pasta.pdf
graph processing frameworks
Ligra: A Lightweight Graph Processing Framework for Shared Memory Julian Shun and Guy Blelloch PPoPP 2013 https://people.csail.mit.edu/jshun/ligra.pdf
Code generation for sparse matrix methods
ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism K. Chesmi, S. Kamil, M. Strout, and M. Mehri Dehnavi SC18
Code generation for sparse matrix methods
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis K. Chesmi, S. Kamil, M. Strout, and M. Mehri Dehnavi SC17
Data Tranformations in Real-world code
Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action Khalid Ahmad, Anand Venkat, Mary W. Hall LCPC 2016 https://doi.org/10.1007/978-3-319-52709-3_17
Dense MTTKRP
Shared-memory parallelization of MTTKRP for dense tensors. Koby Hayashi, Grey Ballard, Yujie Jiang, and Michael J. Tobia. PPoPP 2018
graph processing frameworks
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable Laxman Dhulipala, Guy Blelloch, and Julian Shun SPAA 2018 https://arxiv.org/abs/1805.05208
graph processing frameworks
Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing Laxman Dhulipala, Guy Blelloch, and Julian Shun SPAA 2017 https://people.csail.mit.edu/jshun/bucketing.pdf Generalized CP streaming framework in ENSIGN Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data Letourneau, P.D., Baskaran, M., Henretty, T., Ezick, J. and Lethin, R HPEC 2018 https://www.reservoir.com/publication/cp-tensor-decomposition-update-framework/
alternating least squares for tensor decomposition
Accelerating alternating least squares for tensor decomposition by pairwise perturbation Linjian Ma and Edgar Solomonik arxiv 2018 https://arxiv.org/abs/1811.10573
Survey/overview
The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code M. M. Strout, M. Hall and C. Olschanowsky Proc. of the IEEE 2018 10.1109/JPROC.2018.2857721
Loop upper bounds
Communication-Optimal Loops Nick Knight Phd Thesis 2015
Finding Good Block Sizes In Constant Time
A Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats Peter Ahrens, Helen Xu, Nicholas Schiefer IPDPS 2018 https://doi.org/10.1109/IPDPS.2018.00064
CSF formalization
Tensor-Matrix Products with a Compressed Sparse Tensor. Shaden Smith, George Karypis IA^3 2015 http://shaden.io/pub-files/smith2015csf.pdf
TTMc optimization
Accelerating the Tucker Decomposition with Compressed Sparse Tensors Shaden Smith, George Karypis Euro-Par 2017 http://shaden.io/pub-files/smith2017tucker.pdf
SPLATT - CSF data structure and MTTKRP
SPLATT: Efficient and parallel sparse tensor-matrix multiplication. Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis IPDPS 2015 http://shaden.io/pub-files/smith2015splatt.pdf
Generating code for computing with disparate tensor formats
Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe OOPSLA 2018 https://dl.acm.org/citation.cfm?doid=3288538.3276493
High-performance Sparse Graph Computations
GraphIt: a high-performance graph DSL "Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, Saman Amarasinghe " OOPSLA 2018 https://dl.acm.org/citation.cfm?id=3276491
Storage formats for tensor operations: Cloud
CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms Z. Blanco, B. Liu, and M. Mehri Dehnavi ICPP

Venue

Room 32-155
MIT Ray and Maria Stata Center
32 Vassar St
Cambridge, MA 02139