BERT论文完整翻译与解析:Pre-training of Deep Bidirectional Transformers

  • A+
所属分类:AI

BERT论文完整翻译与深度解析:Pre-training of Deep Bidirectional Transformers

#

论文基本信息

  • 标题:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    • 作者:Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
      • 机构:Google AI Language
        • 发表年份:2018
          • 引用数:超过8万次

            #

            摘要

            英文摘要:
            We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.

            中文翻译:
            我们介绍了一种新的语言表示模型BERT,即来自Transformer的双向编码器表示。与最近的 language representation models 不同,BERT旨在通过在所有层中 jointly conditioning 左右上下文来预训练深度双向表示。

            #

            1. 导言

            英文原文:
            There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning. The feature-based approach uses task-specific architectures that include the pre-trained representations as features.

            中文翻译:
            将预训练语言表示应用于下游任务有两种现有策略:基于特征的方法和微调方法。基于特征的方法使用包含预训练表示作为特征的任务特定架构。

            英文原文:
            We propose BERT which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context.

            中文翻译:
            我们提出BERT,代表来自Transformer的双向编码器表示。BERT旨在通过 jointly conditioning 左右上下文来预训练深度双向表示。

            #

            2. 模型架构

            英文原文:
            BERT is a multi-layer bidirectional Transformer encoder. We denote the number of layers as L, the hidden size as H, and the number of self-attention heads as A.

            中文翻译:
            BERT是一个多层双向Transformer编码器。我们将层数记为L,隐藏大小记为A,自注意力头数记为A。

            #

            3. 预训练任务

            英文原文:
            We pre-train BERT using two unsupervised tasks: (1) Masked LM and (2) Next Sentence Prediction.

            中文翻译:
            我们使用两个无监督任务预训练BERT:(1)掩码语言建模(MLM)和(2)下一句预测(NSP)。

            #

            4. 核心技术名词总结

            1. Pre-training(预训练):在大规模无标签数据上训练语言模型
            2. Fine-tuning(微调):在特定任务上微调预训练模型
            3. Masked LM(掩码语言模型):随机掩码部分token进行预测
            4. Next Sentence Prediction(NSP):预测句子是否相邻
            5. Transformer Encoder:仅使用编码器的Transformer架构
            6. WordPiece Tokenization:子词分词方法
            7. Position Embeddings:位置嵌入
            8. Segment Embeddings:片段嵌入区分不同句子

            #

            5. 总结

            BERT是NLP领域最重要的突破之一,它开创了"预训练-微调"范式,在GLUE、SQuAD等基准测试上刷新了记录。

ZPY

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: