Einsums
Einsums are the lifeblood of tensor arithmetic in ML. They provide a clear syntax to express high dimensional tensor operations. Furthermore, they are often more efficient than using a mix of traditional operators because linear algebra libraries are able to reorder the operations to minimize the materialized size.
Syntax
We write an einsum using np.einsum(subscripts, *operands) function.
subscriptsis a python string defining the operation to apply to the operands.- The string is formatted as such
dims_1,dims_2,...->dims_out - We give a name to each dimension of each operand for instance a batch of images could be
bwh(batch, width, height.) - We separate operands with
,. For instancebwh,whd(wheredis the model dimension.) - We specify the output dimensions after
->. For instancebwh,whd->bd.
- The string is formatted as such
*operandsare an arbitrary amount of arrays to which the operation will be applied. For instancenp.einsum('bwh,whd->bd', images, weights)
Understanding Einsums
- Repeating Letters: If an index appears in two inputs (e.g.,
jinij,jk), it implies multiplication along that dimension. - Omitted Letters (Reduction): If an index appears in the input but not the output, it is summed over (reduced).
- Output Order: You can rearrange the output dimensions arbitrarily (e.g.,
ij->jiis a transpose).
| Operation | Standard API | Einsum Notation |
|---|---|---|
| Transpose | A.T | ij -> ji |
| Sum | A.sum() | ij -> |
| Column Sum | A.sum(axis=0) | ij -> j |
| Dot Product | a @ b | i, i -> |
| Matrix Mul | A @ B | ik, kj -> ij |
| Batch MatMul | A @ B | bik, bkj -> bij |
| Outer Product | np.outer(a, b) | i, j -> ij |
Broadcasting with Ellipsis (...)
In Deep Learning, we often write code that shouldn't care about the number of batch dimensions (e.g., handling both (batch, sequence, feature) and (batch, sequence, num_heads, feature)).
einsum supports ... to represent "all other dimensions".
# Apply a linear layer (Weights: i, j) to a tensor
# of ANY shape ending in 'i'
# ...i, ij -> ...j
output = np.einsum('...i,ij->...j', input_tensor, weights)
Code Examples
import numpy as np
batch = 10
width = 28
height = 64
d_model = 512
images = np.random.normal(size=(batch, width, height))
weights = np.random.normal(size=(width, height, d_model))
print(f"{np.einsum('bwh,whd->bd', images, weights).shape=}")
stdout
np.einsum('bwh,whd->bd', images, weights).shape=(10, 512)
This reduces both the width and the height. But we could also just reduce the width for instance, batch the height and write the output in a different order. For instance bwh,whd->dbh.
stdout
np.einsum('bwh,whd->dbh', images, weights).shape=(512, 10, 64)
Path Optimizations
When multiplying three or more matrices, the order of operations matters significantly for memory.
(A @ B) @ C vs A @ (B @ C)
If A is (1000, 2), B is (2, 1000), and C is (1000, 1000):
A @ Bcreates a(1000, 1000)intermediate matrix (1M elements.)B @ Ccreates a(2, 1000)intermediate matrix (2k elements.)
The second path is orders of magnitude more memory efficient. np.einsum (with optimize=True) automatically finds this path.
import numpy as np
# A chain of 3 matrix multiplications
# Dimensions chosen to make one path disastrously memory heavy
a = np.random.normal(size=(1000, 2))
b = np.random.normal(size=(2, 1000))
c = np.random.normal(size=(1000, 1000))
# Naive chaining (Left-to-Right)
# Creates (1000, 1000) intermediate!
res_naive = (a @ b) @ c
# Einsum Optimization
# Automatically detects that contracting (b, c) first is cheaper
res_einsum = np.einsum('ij,jk,kl->il', a, b, c, optimize=True)
np.testing.assert_allclose(res_naive, res_einsum)
Code Visualization
einsum can be difficult to debug. It helps to visualize it as a nested loop.
Single reduced dimension
Let's visualize bwh,whd->db. We are reducing w and h, and transposing the result to d, b.
import numpy as np
batch = 10
width = 28
height = 64
d_model = 512
images = np.random.normal(size=(batch, width, height))
weights = np.random.normal(size=(width, height, d_model))
manual_out = np.zeros((d_model, batch, height))
# One loop per non reduced dimension
for b in range(batch):
for h in range(height):
for d in range(d_model):
manual_out[d, b, h] = images[b, :, h] @ weights[:, h, d]
einsum_out = np.einsum('bwh,whd->dbh', images, weights)
np.testing.assert_almost_equal(manual_out, einsum_out)
We loop over all our batch dimensions, we extract vectors of size w that we dot product and write at the correct (transposed) output dimension.
Multiple Reduced Dimension
The bwh,whd->db einsum is more interesting because it reduces both w and h. Concretely, the only difference with the above einsum is that we will revisit the same d, b output tile multiple times, so we need to reduce intermediate dot products into their corresponding output indices.
import numpy as np
manual_out = np.zeros((d_model, batch))
# One loop per non reduced dimension
for b in range(batch):
for h in range(height):
for d in range(d_model):
# The 'w' dimension is reduced via the dot product (@)
# We accumulate (+=) because 'h' is also being reduced
manual_out[d, b] += images[b, :, h] @ weights[:, h, d]
einsum_out = np.einsum('bwh,whd->db', images, weights)
np.testing.assert_almost_equal(manual_out, einsum_out)
Exercises
Let's practice now some einsum functions!
Outer Product
The outer product takes two vectors and produces a matrix, by multiplying every element of the first vector by every element of the second vector.
import numpy as np
size = 10
a = np.ones(size)
b = np.ones(size)
res = np.einsum('your_einsum', a, b) # <-- einsum here
desired = np.outer(a, b)
np.testing.assert_array_equal(res, desired)
print(f"{res.shape=}")
Solution
res = np.einsum('i,j->ij', a, b)
Dot Product
The dot product is the sum of the products of elements at corresponding indices between two vectors of the same size.
\[\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n\]
import numpy as np
size = 10
a = np.ones(size)
b = np.ones(size)
res = np.einsum('your_einsum', a, b) # <-- einsum here
desired = np.dot(a, b)
np.testing.assert_array_equal(res, desired)
print(f"{res.shape=}")
Solution
res = np.einsum('i,i->', a, b)
Inner Product
The inner product is the same operation as dot-product when performed on two vectors.
When applied to matrices, we take every row from the first matrix and calculate the dot product against every row of the second matrix. This results in a matrix where each entry (i, j) tells us how aligned the i-th row of the first matrix is with the j-th row of the second.
import numpy as np
size = 10
a = np.ones((size, 2*size))
b = np.ones((size, 2*size))
res = np.einsum('your_einsum', a, b) # <-- einsum here
desired = np.inner(a, b)
np.testing.assert_array_equal(res, desired)
print(f"{res.shape=}")
Solution
res = np.einsum('ik,jk->ij', a, b)
Path Optimizations
Let's implement an einsum between three 2-dimensional tensors.
import numpy as np
batch = 100
dim_in = 10
dim_hidden = 1000
dim_out = 20
x = np.ones((batch, dim_in))
w_in = np.ones((dim_in, dim_hidden))
w_out = np.ones((dim_hidden, dim_out))
res = np.einsum('your_einsum', x, w_in, w_out) # <-- einsum here
desired = x @ w_in @ w_out
np.testing.assert_array_equal(res, desired)
print(f"{res.shape=}")
Solution
Make sure you use the optimal way to multiply the matrices with optimize=True.
res = np.einsum('bi,ih,ho->bo', x, w_in, w_out, optimize=True)
Tensor dot product
Let's implement an einsum between two 3-dimensional tensors.
We want to contract along the common dim_model dimension.
import numpy as np
batch = 100
sequence = 10
dim_model = 1000
n_heads = 2
head_dim = dim_model // n_heads
x = np.ones((batch, sequence, dim_model))
w = np.ones((dim_model, n_heads, head_dim))
res = np.einsum('your_einsum', x, w) # <-- einsum here
desired = (x[:, None, :, :] @ w.transpose(1, 0, 2)).transpose(0, 2, 1, 3)
np.testing.assert_array_equal(res, desired)
print(f"{res.shape=}")
Solution
res = np.einsum('bsd,dnh->bsnh', x, w)