# Model sharding using Pipeline Parallel¶

Let us start with a toy model that contains two linear layers.

```
import torch
import torch.nn as nn
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
self.net1 = torch.nn.Linear(10, 10)
self.relu = torch.nn.ReLU()
self.net2 = torch.nn.Linear(10, 5)
def forward(self, x):
x = self.relu(self.net1(x))
return self.net2(x)
model = ToyModel()
```

To run this model on 2 GPUs we need to convert the model
to `torch.nn.Sequential`

and then wrap it with `fairscale.nn.Pipe`

.

```
import fairscale
import torch
import torch.nn as nn
model = nn.Sequential(
torch.nn.Linear(10, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 5)
)
model = fairscale.nn.Pipe(model, balance=[2, 1])
```

This will run the first two layers on `cuda:0`

and the last
layer on `cuda:1`

. To learn more, visit the Pipe documentation.

You can then define any optimizer and loss function

```
import torch.optim as optim
import torch.nn.functional as F
optimizer = optim.SGD(model.parameters(), lr=0.001)
loss_fn = F.nll_loss
optimizer.zero_grad()
target = torch.randint(0,2,size=(20,1)).squeeze()
data = torch.randn(20, 10)
```

Finally, to run the model and compute the loss function, make sure that outputs and target are on the same device.

```
device = model.devices[0]
## outputs and target need to be on the same device
# forward step
outputs = model(data.to(device))
# compute loss
loss = loss_fn(outputs.to(device), target.to(device))
# backward + optimize
loss.backward()
optimizer.step()
```

You can find a complete example under the examples folder in the fairscale repo.