Introduction
Array Programming Fundamentals
ML Compilers
Backward Pass
On-Chip Parallelism
Estimating Performance
Distributed Computations
LLM Serving Optimizations
Mixture of Experts (MoE)
1. Expert Sharding
2. Expert Imbalance (TODO)
Credits

ML Performance

Distributed Operations

Let's first review the three most important distributed operations.