Shortcuts

Mixture Of Experts

class fairscale.nn.MOELayer(gate: torch.nn.modules.module.Module, experts: Union[torch.nn.modules.module.Module, torch.nn.modules.container.ModuleList], group: Optional[Any] = None)[source]

MOELayer module which implements MixtureOfExperts as described in Gshard.

gate = Top2Gate(model_dim, num_experts)
moe = MOELayer(gate, expert)
output = moe(input)
l_aux = moe.l_aux
Parameters
  • gate – gate network

  • expert – expert network

  • group – group to use for all-to-all communication