BERT论文完整翻译与深度解析:Pre-training of Deep Bidirectional Transformers
#
论文基本信息
- 标题:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- 作者:Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
- 机构:Google AI Language
- 发表年份:2018
- 引用数:超过8万次
#
摘要
英文摘要:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
中文翻译:
我们介绍了一种新的语言表示模型BERT,即来自Transformer的双向编码器表示。与最近的 language representation models 不同,BERT旨在通过在所有层中 jointly conditioning 左右上下文来预训练深度双向表示。
#
1. 导言
英文原文:
There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning. The feature-based approach uses task-specific architectures that include the pre-trained representations as features.
中文翻译:
将预训练语言表示应用于下游任务有两种现有策略:基于特征的方法和微调方法。基于特征的方法使用包含预训练表示作为特征的任务特定架构。
英文原文:
We propose BERT which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context.
中文翻译:
我们提出BERT,代表来自Transformer的双向编码器表示。BERT旨在通过 jointly conditioning 左右上下文来预训练深度双向表示。
#
2. 模型架构
英文原文:
BERT is a multi-layer bidirectional Transformer encoder. We denote the number of layers as L, the hidden size as H, and the number of self-attention heads as A.
中文翻译:
BERT是一个多层双向Transformer编码器。我们将层数记为L,隐藏大小记为A,自注意力头数记为A。
#
3. 预训练任务
英文原文:
We pre-train BERT using two unsupervised tasks: (1) Masked LM and (2) Next Sentence Prediction.
中文翻译:
我们使用两个无监督任务预训练BERT:(1)掩码语言建模(MLM)和(2)下一句预测(NSP)。
#
4. 核心技术名词总结
1. Pre-training(预训练):在大规模无标签数据上训练语言模型
2. Fine-tuning(微调):在特定任务上微调预训练模型
3. Masked LM(掩码语言模型):随机掩码部分token进行预测
4. Next Sentence Prediction(NSP):预测句子是否相邻
5. Transformer Encoder:仅使用编码器的Transformer架构
6. WordPiece Tokenization:子词分词方法
7. Position Embeddings:位置嵌入
8. Segment Embeddings:片段嵌入区分不同句子
#
5. 总结
BERT是NLP领域最重要的突破之一,它开创了"预训练-微调"范式,在GLUE、SQuAD等基准测试上刷新了记录。
- 作者:Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
- 机构:Google AI Language
- 发表年份:2018
- 引用数:超过8万次
#
摘要
英文摘要:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.中文翻译:
我们介绍了一种新的语言表示模型BERT,即来自Transformer的双向编码器表示。与最近的 language representation models 不同,BERT旨在通过在所有层中 jointly conditioning 左右上下文来预训练深度双向表示。#
1. 导言
英文原文:
There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning. The feature-based approach uses task-specific architectures that include the pre-trained representations as features.中文翻译:
将预训练语言表示应用于下游任务有两种现有策略:基于特征的方法和微调方法。基于特征的方法使用包含预训练表示作为特征的任务特定架构。英文原文:
We propose BERT which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context.中文翻译:
我们提出BERT,代表来自Transformer的双向编码器表示。BERT旨在通过 jointly conditioning 左右上下文来预训练深度双向表示。#
2. 模型架构
英文原文:
BERT is a multi-layer bidirectional Transformer encoder. We denote the number of layers as L, the hidden size as H, and the number of self-attention heads as A.中文翻译:
BERT是一个多层双向Transformer编码器。我们将层数记为L,隐藏大小记为A,自注意力头数记为A。#
3. 预训练任务
英文原文:
We pre-train BERT using two unsupervised tasks: (1) Masked LM and (2) Next Sentence Prediction.中文翻译:
我们使用两个无监督任务预训练BERT:(1)掩码语言建模(MLM)和(2)下一句预测(NSP)。#
4. 核心技术名词总结
1. Pre-training(预训练):在大规模无标签数据上训练语言模型
2. Fine-tuning(微调):在特定任务上微调预训练模型
3. Masked LM(掩码语言模型):随机掩码部分token进行预测
4. Next Sentence Prediction(NSP):预测句子是否相邻
5. Transformer Encoder:仅使用编码器的Transformer架构
6. WordPiece Tokenization:子词分词方法
7. Position Embeddings:位置嵌入
8. Segment Embeddings:片段嵌入区分不同句子#
5. 总结
BERT是NLP领域最重要的突破之一,它开创了"预训练-微调"范式,在GLUE、SQuAD等基准测试上刷新了记录。
- 引用数:超过8万次
- 发表年份:2018
- 机构:Google AI Language