Resnet has series viesions: resnet18, resnet34, resnet50, resnet101, resnet152. Generally, they are similar with the structure, and different with the block type (BasicBlock or Bottleneck), numbers of blocks stacked, depth and so on. More details can be found in the following figure.
Block
As showed in the following figure, there are two types of blocks, the fist is BasicBlock which is used in resnet18 and resnet34, and the second is Bottleneck which is used in other more depth resnet.
def__init__(self, inplanes, planes, stride=1, downsample=None): # inplanes: input channel # planes: output channel (in fact, the output channel equals # to planes * expansion; since the expansion is always 1 # in this block, the ouput channel equals planes) super(BasicBlock, self).__init__() self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = nn.BatchNorm2d(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = nn.BatchNorm2d(planes) self.downsample = downsample self.stride = stride
defforward(self, x): identity = x
out = self.conv1(x) out = self.bn1(out) out = self.relu(out)
out = self.conv2(out) out = self.bn2(out)
if self.downsample isnotNone: identity = self.downsample(x)
out += identity out = self.relu(out)
return out
There are two layers in BascicBlock: the first layer is 3x3 convolution with padding, followed by batchnorm2d and ReLU; the second layer is similar to first but has no ReLU. The identity (x or downsample x) is added to the output of two layers.
out = self.conv1(x) out = self.bn1(out) out = self.relu(out)
out = self.conv2(out) out = self.bn2(out) out = self.relu(out)
out = self.conv3(out) out = self.bn3(out)
if self.downsample isnotNone: identity = self.downsample(x)
out += identity out = self.relu(out)
return out
There are three layers in Bottleneck: the first layer (conv1x1) first compresses the input features (with inplanes channels) to the fetures1 (with planes channes) to reduce the computation cost. Also, it is followed by Batchnorm2d and ReLU (the second and third layers have the similar operations). Then, the second layer (conv3x3) is used to get features2. Lastly, the third layer (conv1x1) is used to get the desired channels of the Bottleneck (planes * expansion). Also, the identity (x or downsample x) is added to the output of three layers.
for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0)
# Zero-initialize the last BN in each residual branch, # so that the residual branch starts with zeros, and each residual block behaves like an identity. # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677 if zero_init_residual: for m in self.modules(): if isinstance(m, Bottleneck): nn.init.constant_(m.bn3.weight, 0) elif isinstance(m, BasicBlock): nn.init.constant_(m.bn2.weight, 0)
layers = [] layers.append(block(self.inplanes, planes, stride, downsample)) self.inplanes = planes * block.expansion for _ in range(1, blocks): layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
defforward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x)
x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x)
x = self.avgpool(x) x = x.view(x.size(0), -1) x = self.fc(x)
return x
The resnet has three parts, the first is conv7x7, followed by maxpool3x3. The second is the main part. Generally, it has four blocks statcked (which is called self.layerx in code). Each layer is built by the funciton _make_layer. Please note the difference input channels between the first block and other blocks in the blocks stacked: the input channels of the first block are inplanes and the input channels of other block are planes * block.expansion. The last part is avgpool, followed by fully connected layer.
1 2 3 4 5 6 7 8 9 10
defresnet101(pretrained=False, **kwargs): """Constructs a ResNet-101 model. Args: pretrained (bool): If True, returns a model pre-trained on ImageNet """ model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs) if pretrained: model.load_state_dict(model_zoo.load_url(model_urls['resnet101'])) return model
Finally, take the resnet101 as an instance. If we want to instantiate the resnet101 class, we just to set the following parameters: pretrained: whether to use the pretrained model, default False num_classes: the label space count, defalut 1000 zero_init_residual: whether to use the zero to init the weight, default False