4

I am using the GPT2 pre-trained model for a research project and when I load the pre-trained model with the following code,

from transformers.models.gpt2.modeling_gpt2 import GPT2Model
gpt2 = GPT2Model.from_pretrained('gpt2')

I get the following warning message:

Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

From my understanding, it says that the weights of the above layers are not initialized from the pre-trained model. But we all know that attention layers ('attn') are so important in GPT2 and if we can not have their actual weights from the pre-trained model, then what is the point of using a pre-trained model?

I really appreciate it if someone could explain this to me and tell me how I can fix this.

K.N
  • 871
  • 2
  • 10
  • 30

1 Answers1

1

The masked_bias was added but the huggingface community as a speed improvement compared to the original implementation. It should not negatively impact the performance as the original weights are loaded properly. Check this PR for further information.

cronoik
  • 15,434
  • 3
  • 40
  • 78