Shortcuts

Mixture Of Experts

class fairscale.nn.MOELayer(gate: torch.nn.modules.module.Module, experts: Union[torch.nn.modules.module.Module, torch.nn.modules.container.ModuleList], group: Optional[Any] = None)[source]

MOELayer module which implements MixtureOfExperts as described in Gshard.

gate = Top2Gate(model_dim, num_experts)
moe = MOELayer(gate, expert)
output = moe(input)
l_aux = moe.l_aux
Parameters
  • gate – gate network

  • expert – expert network

  • group – group to use for all-to-all communication

Read the Docs v: stable
Versions
latest
stable
docs
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.