Roofline Model

The roofline model measures the peak throughput of the chip.

The Core Concept: Arithmetic Intensity

Arithmetic Intensity (AI): is a characteristic of your algorithm. It's the ratio of compute to memory access.

AI = flops / data_moved

The achievable performance will be expressed as

perf = min(peak_flops, peak_bandwidth * AI)

Takeaway

The key takeaway is that as long as we are memory bound, increasing the arithmetic intensity will yield significant throughput improvements. When we are compute bound, increasing the arithmetic intensity, will commensurably increase the latency, resulting in no throughput gains. Therefore, we always want to be compute bound for throughput, but any point on the "flat part" of the roofline will be as efficient as any other.

ML Performance

Roofline Model

The Core Concept: Arithmetic Intensity

Takeaway