mobileNet

典中典之MobileNet

​ 大家好,我们今天继续介绍计算机视觉领域中,经典的网络模型。今天要介绍的模型是名为MobileNet的轻量级网络。正如其名字所展示,它是为移动端和嵌入式端深度学习应用设计的网络,希望在cpu上也能达到理想的运算速度要求。

介绍

​ 网络结构:

img

图1.mobileNet的网络结构

​ 通过上图的网络结构可以看出,mobileNet为了实现轻量化的目标,舍弃了许多Pooling(池化)操作,网络结构也更紧凑。

创新点

​ moblieNet中最重要的操作,就是将普通的卷积换成了Depthwise separable convolution和Pointwise convolution,从而降低计算量。

img

图2.左侧是常规卷积结构,右侧是将常规卷积拆分成Depthwise 卷积和Pointwise卷积后的结构

img

图3.将常规卷积替换为Depthwise Convolution和Pointwise Convolution

​ 那么具体是如何替换的呢?假设我们有3个feature map,4个卷积核,那么常规的卷积操作,是每个卷积核对每个feature map进行卷积然后求和,得到4个输出的feature map

conv-std

图4.常规卷积

首先Depthwise Convolution的一个卷积核负责一个通道,一个通道只被一个卷积核卷积。

depthwise-conv

图5.Depthwise Convolution

​ 虽然卷积核的数量不一样,但是为了保持卷积后通道数量和常规卷积一致,需要使用Pointwise convolution,也是就是1×1卷积,将得到的特征图根据不同的权重进行融合。

pointwise-conv

图6.Pointwise Convolution

​ 直观上看,Depthwise Convolution限制了一个feature map对应一个卷积核,这样做避免了同一个卷积核对不同feature map的卷积操作,减少了计算量。原论文中有对计算量减少的形式化证明,感兴趣的不妨自行查阅。

代码

代码来自https://github.com/weiaicunzai/pytorch-cifar100

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
"""mobilenet in pytorch
[1] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
https://arxiv.org/abs/1704.04861
"""

import torch
import torch.nn as nn


class DepthSeperabelConv2d(nn.Module):

def __init__(self, input_channels, output_channels, kernel_size, **kwargs):
super().__init__()
self.depthwise = nn.Sequential(
nn.Conv2d(
input_channels,
input_channels,
kernel_size,
groups=input_channels,
**kwargs),
nn.BatchNorm2d(input_channels),
nn.ReLU(inplace=True)
)

self.pointwise = nn.Sequential(
nn.Conv2d(input_channels, output_channels, 1),
nn.BatchNorm2d(output_channels),
nn.ReLU(inplace=True)
)

def forward(self, x):
x = self.depthwise(x)
x = self.pointwise(x)

return x


class BasicConv2d(nn.Module):

def __init__(self, input_channels, output_channels, kernel_size, **kwargs):

super().__init__()
self.conv = nn.Conv2d(
input_channels, output_channels, kernel_size, **kwargs)
self.bn = nn.BatchNorm2d(output_channels)
self.relu = nn.ReLU(inplace=True)

def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)

return x


class MobileNet(nn.Module):

"""
Args:
width multipler: The role of the width multiplier α is to thin
a network uniformly at each layer. For a given
layer and width multiplier α, the number of
input channels M becomes αM and the number of
output channels N becomes αN.
"""

def __init__(self, width_multiplier=1, class_num=100):
super().__init__()

alpha = width_multiplier
self.stem = nn.Sequential(
BasicConv2d(3, int(32 * alpha), 3, padding=1, bias=False),
DepthSeperabelConv2d(
int(32 * alpha),
int(64 * alpha),
3,
padding=1,
bias=False
)
)

#downsample
self.conv1 = nn.Sequential(
DepthSeperabelConv2d(
int(64 * alpha),
int(128 * alpha),
3,
stride=2,
padding=1,
bias=False
),
DepthSeperabelConv2d(
int(128 * alpha),
int(128 * alpha),
3,
padding=1,
bias=False
)
)

#downsample
self.conv2 = nn.Sequential(
DepthSeperabelConv2d(
int(128 * alpha),
int(256 * alpha),
3,
stride=2,
padding=1,
bias=False
),
DepthSeperabelConv2d(
int(256 * alpha),
int(256 * alpha),
3,
padding=1,
bias=False
)
)

#downsample
self.conv3 = nn.Sequential(
DepthSeperabelConv2d(
int(256 * alpha),
int(512 * alpha),
3,
stride=2,
padding=1,
bias=False
),

DepthSeperabelConv2d(
int(512 * alpha),
int(512 * alpha),
3,
padding=1,
bias=False
),
DepthSeperabelConv2d(
int(512 * alpha),
int(512 * alpha),
3,
padding=1,
bias=False
),
DepthSeperabelConv2d(
int(512 * alpha),
int(512 * alpha),
3,
padding=1,
bias=False
),
DepthSeperabelConv2d(
int(512 * alpha),
int(512 * alpha),
3,
padding=1,
bias=False
),
DepthSeperabelConv2d(
int(512 * alpha),
int(512 * alpha),
3,
padding=1,
bias=False
)
)

#downsample
self.conv4 = nn.Sequential(
DepthSeperabelConv2d(
int(512 * alpha),
int(1024 * alpha),
3,
stride=2,
padding=1,
bias=False
),
DepthSeperabelConv2d(
int(1024 * alpha),
int(1024 * alpha),
3,
padding=1,
bias=False
)
)

self.fc = nn.Linear(int(1024 * alpha), class_num)
self.avg = nn.AdaptiveAvgPool2d(1)

def forward(self, x):
x = self.stem(x)

x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.conv4(x)

x = self.avg(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x


def mobilenet(alpha=1, class_num=100):
return MobileNet(alpha, class_num)

我们下次再见。