MIT Stata Center

Invited Workshop on Compiler Techniques for Sparse Tensor Algebra

Cambridge MA, January 26th

Organizers: Fredrik Kjolstad and Saman Amarasinghe

There has been a lot of recent interest and innovation in compiler technology for high performance sparse tensor computation. This workshop will bring together relevant parties from academia and industry to discuss individual approaches, how they relate, and whether the ideas can be combined.

Program

There will be seven session of three speakers. The sessions will consist of three ten minute talks followed by a twenty minute speaker and room discussion.

Time	Session
8:00 - 8:20	Registration
8:20 - 8:30	Welcome
8:30 - 9:20	Sparse Algorithms and Data Structures I Chaired by Saman Amarasinghe (MIT) Jiajia Li (Pacific Northwest National Laboratory) Shaden Smith (Intel Labs) Edgar Solomonik (University of Illinois at Urbana-Champaign)
9:20 - 9:30	Break
9:30 - 10:20	Sparse Algorithms and Data Structures II Chaired by Maryam Dehnavi (University of Toronto) Muthu Baskaran (Reservoir Labs) Jee Choi (University of Oregon) Duane Merrill and Michael Garland (NVIDIA Research)
10:20 - 10:30	Break
10:30 - 11:20	Sparse Polyhedral Optimization Chaired by Shoaib Kamil (Adobe Research) Mary Hall (University of Utah) Cathie Olschanowsky (Boise State University) Michelle Strout (University of Arizona)
11:20 - 11:30	Break
11:30 - 12:30	Tensor Compilers I Chaired by Stephen Chou (MIT) Dougal Maclaurin (Google) P. (Saday) Sadayappan (Ohio State University) Hongbo Rong (Intel Labs) Keno Fischer (Julia Computing)
12:30 - 1:30	Lunch
1:30 - 2:20	Tensor Compilers II Chaired by Chris Fletcher (University of Illinois at Urbana-Champaign) Maryam Dehnavi (University of Toronto) Fredrik Kjolstad (MIT) Stephen Chou (MIT)
2:20 - 2:35	Break
2:35 - 3:25	Tensors and Graphs Chaired by Fredrik Kjolstad (MIT) Justin Solomon (MIT) Jeremy Kepner (Lincoln Laboratory) Julian Shun (MIT)
3:25 - 3:40	Break
3:40 - 4:30	Hardware for Sparse Tensor Algebra Chaired by Joel Emer (NVIDIA Research and MIT) Michael Pellauer (NVIDIA Research) Hanrui Wang (MIT) Chris Fletcher (University of Illinois at Urbana-Champaign)
4:30 - 4:45	Break
4:45 - 5:35	Applications in Science and Engineering Chaired by Peter Ahrens (MIT) Srinivas Eswar (Georgia Tech) Richard Mills (Argonne National Laboratory) Miles Stoudenmire (Flatiron Institute)
5:35 - 6:00	Discussion
6:00 - 8:00	Dinner and Poster Session

Papers

The following is a list of papers that were suggested by workshop participants.

Run-time parallelization in a polyhedral framework
Automating wavefront parallelization for sparse matrix computations Anand Venkat, Mahdi Soltan Mohammadi, Jongsoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, and Mary Hall SC 2016 https://ieeexplore.ieee.org/document/7877119/

Uninterpreted functions in a polyhedral framework
Non-affine Extensions to Polyhedral Code Generation Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Mills Strout CGO 2014 https://doi.org/10.1145/2581122.2544141

Data Transformations in a Polyhedral Framework
Loop and data transformations for sparse matrix code Anand Venkat, Mary Hall, and Michelle Strout PLDI 2015 https://doi.org/10.1145/2737924.2738003

NNCP Algo
Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. Signal Processing 2013

NNCP Algo
Nesterov-based parallel algorithm for large-scale nonnegative tensor factorization. Athanasios P Liavas, Georgios Kostoulas, Georgios Lourakis, Kejun Huang, and Nicholas D Sidiropoulos. ICASSP 2017

Storage formats for tensor operations: GPUs
A Unified Optimization Approach for Sparse Tensor Operations on GPUs B. Liu, C. Wen, A. Sarwate, and M. Mehri Dehnavi Cluster

Sparse tensor optimizations for tensor decompisitons used for sparse count data
Memory-efficient parallel tensor decompositions Baskaran, M., Henretty, T., Pradelle, B., Langston, M. H., Bruns-Smith, D., Ezick, J., Lethin, R HPEC 2017 https://www.reservoir.com/publication/memory-efficient-parallel-tensor-decompositions/)

Mode-specific and mode-generic tensor formats in ENSIGN
Efficient and scalable computations with sparse tensors Baskaran, M., Meister, B., Vasilache, N., & Lethin, R HPEC 2012 https://www.reservoir.com/publication/efficient-scalable-computations-sparse-tensors/)

FROSTT
The Formidable Repository of Open Sparse Tensors and Tools Bunch of people http://frostt.io/

Merge-based parallelism for load balancing
Merge-based Parallel Sparse Matrix-Vector Multiplication Duane Merill and Michael Garland SC 2016 https://ieeexplore.ieee.org/document/7877136

Cyclops tensor framework (distributed memroy tensor algebra)
A massively parallel tensor contraction framework for coupled-cluster computations Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel JPDC 2014 http://www.sciencedirect.com/science/article/pii/S074373151400104X

Sparse tensor contractions (matrix multiplication) and graph algorithms with Cyclops
Scaling betweenness centrality using communication-efficient sparse matrix multiplication Edgar Solomonik, Maciej Besta, Flavio Vella, and Torsten Hoefler SC 2017 https://arxiv.org/abs/1609.07008

TACO system
The Tensor Algebra Compiler Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe OOPSLA 2017 http://fredrikbk.com/oopsla17.html

Communication Lower bounds for MTTKRP
Communication lower bounds for matricized tensor times khatri-rao product. Grey Ballard, Nicholas Knight, and Kathryn Rouse. IPDPS 2018 https://arxiv.org/pdf/1708.07401.pdf

Specify sparse computations for FPGAs and CGRAs
Expressing Sparse Matrix Computations for Productive Performance on Spatial Architectures Hongbo Rong https://arxiv.org/abs/1810.07517

HBL upper bounds
Parallelepipeds obtaining HBL lower bounds. James Demmel and Alex Rusciano CoRR 2016 https://arxiv.org/pdf/1611.05944.pdf

Cache blocking for Sparse Tensors
Blocking Optimization Strategies for Sparse Tensor Computation. Jee Choi, Xing Liu, Shaden Smith, and Tyler Simon IPDPS 2018 https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxqZWV3aGFuY2hvaXBlcnNvbmFsfGd4OjZkNDBlMmU0NGYxZWQzYzI

HiCOO format
HiCOO: Hierarchical Storage of Sparse Tensors Jiajia Li, Jimeng Sun, Richard Vuduc SC 2018 http://fruitfly1026.github.io/static/files/sc18-li.pdf

PASTA benchmark suite
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite Jiajia Li, Yuchen Ma, Xiaolong Wu, Ang Li, Kevin Barker Technical Report 2019 http://fruitfly1026.github.io/static/files/pasta.pdf

graph processing frameworks
Ligra: A Lightweight Graph Processing Framework for Shared Memory Julian Shun and Guy Blelloch PPoPP 2013 https://people.csail.mit.edu/jshun/ligra.pdf

Code generation for sparse matrix methods
ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism K. Chesmi, S. Kamil, M. Strout, and M. Mehri Dehnavi SC18

Code generation for sparse matrix methods
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis K. Chesmi, S. Kamil, M. Strout, and M. Mehri Dehnavi SC17

Data Tranformations in Real-world code
Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action Khalid Ahmad, Anand Venkat, Mary W. Hall LCPC 2016 https://doi.org/10.1007/978-3-319-52709-3_17

Dense MTTKRP
Shared-memory parallelization of MTTKRP for dense tensors. Koby Hayashi, Grey Ballard, Yujie Jiang, and Michael J. Tobia. PPoPP 2018

graph processing frameworks
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable Laxman Dhulipala, Guy Blelloch, and Julian Shun SPAA 2018 https://arxiv.org/abs/1805.05208

graph processing frameworks
Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing Laxman Dhulipala, Guy Blelloch, and Julian Shun SPAA 2017 https://people.csail.mit.edu/jshun/bucketing.pdf Generalized CP streaming framework in ENSIGN Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data Letourneau, P.D., Baskaran, M., Henretty, T., Ezick, J. and Lethin, R HPEC 2018 https://www.reservoir.com/publication/cp-tensor-decomposition-update-framework/

alternating least squares for tensor decomposition
Accelerating alternating least squares for tensor decomposition by pairwise perturbation Linjian Ma and Edgar Solomonik arxiv 2018 https://arxiv.org/abs/1811.10573

Survey/overview
The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code M. M. Strout, M. Hall and C. Olschanowsky Proc. of the IEEE 2018 10.1109/JPROC.2018.2857721

Loop upper bounds
Communication-Optimal Loops Nick Knight Phd Thesis 2015

Finding Good Block Sizes In Constant Time
A Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats Peter Ahrens, Helen Xu, Nicholas Schiefer IPDPS 2018 https://doi.org/10.1109/IPDPS.2018.00064

CSF formalization
Tensor-Matrix Products with a Compressed Sparse Tensor. Shaden Smith, George Karypis IA^3 2015 http://shaden.io/pub-files/smith2015csf.pdf

TTMc optimization
Accelerating the Tucker Decomposition with Compressed Sparse Tensors Shaden Smith, George Karypis Euro-Par 2017 http://shaden.io/pub-files/smith2017tucker.pdf

SPLATT - CSF data structure and MTTKRP
SPLATT: Efficient and parallel sparse tensor-matrix multiplication. Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis IPDPS 2015 http://shaden.io/pub-files/smith2015splatt.pdf

Generating code for computing with disparate tensor formats
Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe OOPSLA 2018 https://dl.acm.org/citation.cfm?doid=3288538.3276493

High-performance Sparse Graph Computations
GraphIt: a high-performance graph DSL "Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, Saman Amarasinghe " OOPSLA 2018 https://dl.acm.org/citation.cfm?id=3276491

Storage formats for tensor operations: Cloud
CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms Z. Blanco, B. Liu, and M. Mehri Dehnavi ICPP

Venue

Room 32-155
MIT Ray and Maria Stata Center
32 Vassar St
Cambridge, MA 02139