All-To-All

All-To-All is a more general operation where each device exchanges data with all other devices. It is typically used when indexing a sharded array. Custom All-To-All can exchange different elements with all other devices depending on some runtime condition, this is the case for Mixture of Experts.

img