ResNet论文完整翻译与解析：Deep Residual Learning for Image Recognition

ZPY

12小时 ago

ResNet论文完整翻译与深度解析：Deep Residual Learning for Image Recognition

论文基本信息

标题：Deep Residual Learning for Image Recognition

作者：Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

机构：Microsoft Research

发表年份：2015

发表会议：CVPR 2016

引用数：超过15万次
---

#

摘要
英文原文：
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, rather than learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

中文翻译：
更深的神经网络更难训练。我们提出了一个残差学习框架，以简化训练比之前使用更深得多的网络。我们明确地将各层重新表述为学习残差函数，参考层输入，而不是学习无关函数。我们提供了全面的实验证据，表明这些残差网络更容易优化，并且可以从显著增加的深度中获得准确率。

---

#

1. 导言
英文原文：
Deep convolutional neural networks have led to a series of breakthroughs for image classification. Deep networks naturally integrate low/mid/high-level features and classifiers in an end-to-end fashion, and the levels of features can be enriched by the number of stacked layers.

中文翻译：
深度卷积神经网络导致了图像分类的一系列突破。深度网络以端到端的方式自然集成低/中/高级特征，并且可以通过堆叠层数来丰富特征级别。

英文原文：
When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error.

中文翻译：
当更深的网络开始收敛时，暴露了一个退化问题：随着网络深度增加，准确率变得饱和然后迅速下降。令人惊讶的是，这种退化不是由过拟合引起的，向适当深的模型添加更多层会导致更高的训练错误。

---

#

2. 深度残差学习
英文原文：
We let H(x) be the underlying mapping to be fit by a few stacked layers (not necessarily the entire net), with x denoting the inputs to the first of these layers. If one hypothesizes that multiple nonlinear layers can asymptotically approximate complicated functions, then it is equivalent to hypothesize that they can asymptotically approximate the residual functions.

中文翻译：
我们让H(x)成为由几个堆叠层（不一定是整个网络）要拟合的底层映射，x表示这些层的第一层输入。如果我们假设多个非线性层可以渐近地近似复杂函数，那么它们可以渐近地近似残差函数这一假设是等价的。

英文原文：
So rather than expect stacked layers to approximate H(x), we explicitly let these layers approximate a residual function F(x) := H(x) - x. The original function becomes F(x)+x.

中文翻译：
因此，我们不期望堆叠层近似H(x)，而是明确让这些层近似残差函数F(x) := H(x) - x。原始函数变为F(x)+x。

---

#

3. 残差块
英文原文：
The formulation of F(x)+x can be realized by feedforward neural networks with "shortcut connections". Shortcut connections are those that skip one or more layers. In our case, the shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers.

中文翻译：
F(x)+x的公式可以通过具有"快捷连接"的前馈神经网络来实现。快捷连接是跳过一层或多层的连接。在我们的情况下，快捷连接简单地执行恒等映射，它们的输出被添加到堆叠层的输出中。

英文原文：
Identity mapping is chosen for shortcut connections as it introduces no extra parameter or computational complexity. The entire network can still be trained end-to-end with SGD with backpropagation, and can be easily implemented using common libraries.

中文翻译：
我们为快捷连接选择恒等映射，因为它不会引入额外参数或计算复杂度。整个网络仍然可以通过带反向传播的SGD进行端到端训练，并且可以使用常见库轻松实现。

---

#

4. 网络架构
英文原文：
Plain Network: We construct plain networks as follows. For the first conv layer we use 7x7 filters with stride 2, followed by max pooling with stride 2. After that we use 3x3 filters with stride 2 in the building blocks. The network ends with a global average pooling and a 1000-way fully-connected layer with softmax.

中文翻译：
普通网络：我们构建普通网络如下。对于第一个卷积层，我们使用步长为2的7x7过滤器，然后是步长为2的最大池化。之后，我们在构建块中使用步长为2的3x3过滤器。网络以全局平均池化和一个1000路带softmax的全连接层结束。

英文原文：
Residual Network: Based on the above plain networks, we insert shortcut connections which turn the network into its residual version.

中文翻译：
残差网络：基于上述普通网络，我们插入快捷连接，将网络变成其残差版本。

---

#

5. 实验结果
英文原文：
Table 1 shows the top-1 error rates on the validation set of ImageNet validation. For plain nets, we find the deeper 18-layer plain net has higher validation error than the shallower 34-layer plain net.

中文翻译：
表1显示了ImageNet验证集上的top-1错误率。对于普通网络，我们发现更深的18层普通网络比更浅的34层普通网络具有更高的验证错误。

英文原文：
With the residual learning formulation, the 18-layer plain net shows similar convergence. But the 18-layer residual net is better than the 18-layer plain net.

中文翻译：
使用残差学习公式，18层普通网络显示类似的收敛。但18层残差网络比18层普通网络更好。

---

#

6. 核心技术名词总结
1. Residual Learning（残差学习）：学习残差函数而非直接映射

2. Residual Block（残差块）：带有快捷连接的基本模块

3. Identity Mapping（恒等映射）：快捷连接不做任何变换

4. Shortcut Connection（快捷连接）：跳过一层或多层的连接

5. Deep Residual Network（深度残差网络）：使用残差块的深层网络

6. Gradient Flow（梯度流）：残差连接帮助梯度传播

7. Degradation Problem（退化问题）：深层网络准确率下降

8. Bottleneck Architecture（瓶颈架构）：1x1-3x3-1x1的残差块设计

9. Layer Normalization（层归一化）：稳定训练

10. Skip Connection（跳跃连接）：允许梯度直接回传

---

#

7. 总结
ResNet解决了深层神经网络训练的世界难题。通过引入残差学习和快捷连接，使得训练超过100层的神经网络成为可能。ResNet在ImageNet分类任务上取得了突破性成果，152层网络在ImageNet上实现了3.57%的错误率，获得了ILSVRC 2015竞赛冠军。

ResNet的核心贡献：

1. 解决退化问题：随着网络深度增加，准确率先饱和后下降

2. 残差学习：让每层学习残差而非直接映射

3. 快捷连接：恒等映射，让梯度直接传播

4. 可扩展性：成功训练超过1000层的网络

ResNet已成为计算机视觉领域的标准backbone，被广泛应用于图像分类、目标检测、语义分割等任务。这一架构不仅在视觉领域取得成功，其思想也被借鉴到NLP领域，推动了深度学习的发展。