code analyse: official pytorch--resnet

Summit

2019-02-05

code analyse, deep learning, network structure, resnet

As one of the most popular netwok, resnet has applied in many computer vision tasks, such as classification, object detection (e.g., as the backbone), GAN and so on. This aritical is the code analyse about the resnet with official pytorch. The code can be found in https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py (Besides, I recommend the code from mmdetection: https://github.com/summit1993/mmdetection/blob/master/mmdet/models/backbones/resnet.py). For chinese analyse, you can refer to https://www.cnblogs.com/wzyuan/p/9880342.html (which is not written by me).

Resnet has series viesions: resnet18, resnet34, resnet50, resnet101, resnet152. Generally, they are similar with the structure, and different with the block type (BasicBlock or Bottleneck), numbers of blocks stacked, depth and so on. More details can be found in the following figure.

Fig. 1 Resnet Structure

Block

As showed in the following figure, there are two types of blocks, the fist is BasicBlock which is used in resnet18 and resnet34, and the second is Bottleneck which is used in other more depth resnet.
Fig. 2 Resnet Block

BasicBlock

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
    	# inplanes: input channel
    	# planes: output channel (in fact, the output channel equals 
    	# to planes * expansion; since the expansion is always 1
    	# in this block, the ouput channel equals planes)
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out

There are two layers in BascicBlock: the first layer is 3x3 convolution with padding, followed by batchnorm2d and ReLU; the second layer is similar to first but has no ReLU. The identity (x or downsample x) is added to the output of two layers.

Bottleneck

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        # inplanes: input channels
        # planes: compressed chennels (i.e., output channels after conv1x1)
        # output channels Bottleneck equals to planes x expansion
        self.conv1 = conv1x1(inplanes, planes)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes, stride)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = conv1x1(planes, planes * self.expansion)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out

There are three layers in Bottleneck: the first layer (conv1x1) first compresses the input features (with inplanes channels) to the fetures1 (with planes channes) to reduce the computation cost. Also, it is followed by Batchnorm2d and ReLU (the second and third layers have the similar operations). Then, the second layer (conv3x3) is used to get features2. Lastly, the third layer (conv1x1) is used to get the desired channels of the Bottleneck (planes * expansion). Also, the identity (x or downsample x) is added to the output of three layers.

Resnet

class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000, zero_init_residual=False):
    	# block: BasicBlock or Bottleneck
    	# layers: list, the numbers of blocks stacked, e.g., resnet101 is [3, 4, 23, 3]
        super(ResNet, self).__init__()
        self.inplanes = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1):
     	# block: BasicBlock or Bottleneck
     	# planes: compressed chennels (i.e., output channels after conv1x1)
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

The resnet has three parts, the first is conv7x7, followed by maxpool3x3. The second is the main part. Generally, it has four blocks statcked (which is called self.layerx in code). Each layer is built by the funciton _make_layer. Please note the difference input channels between the first block and other blocks in the blocks stacked: the input channels of the first block are inplanes and the input channels of other block are planes * block.expansion. The last part is avgpool, followed by fully connected layer.

def resnet101(pretrained=False, **kwargs):
    """Constructs a ResNet-101 model.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
    return model

Finally, take the resnet101 as an instance. If we want to instantiate the resnet101 class, we just to set the following parameters:
pretrained: whether to use the pretrained model, default False
num_classes: the label space count, defalut 1000
zero_init_residual: whether to use the zero to init the weight, default False