Mixture Of Experts¶
- class fairscale.nn.MOELayer(gate: torch.nn.modules.module.Module, experts: Union[torch.nn.modules.module.Module, torch.nn.modules.container.ModuleList], group: Optional[Any] = None)[source]¶
MOELayer module which implements MixtureOfExperts as described in Gshard.
gate = Top2Gate(model_dim, num_experts) moe = MOELayer(gate, expert) output = moe(input) l_aux = moe.l_aux
- Parameters
gate – gate network
expert – expert network
group – group to use for all-to-all communication