Why some weights of GPT2Model are not initialized?

Question

I am using the GPT2 pre-trained model for a research project and when I load the pre-trained model with the following code,

from transformers.models.gpt2.modeling_gpt2 import GPT2Model
gpt2 = GPT2Model.from_pretrained('gpt2')

I get the following warning message:

Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

From my understanding, it says that the weights of the above layers are not initialized from the pre-trained model. But we all know that attention layers ('attn') are so important in GPT2 and if we can not have their actual weights from the pre-trained model, then what is the point of using a pre-trained model?

I really appreciate it if someone could explain this to me and tell me how I can fix this.

Found an answer here: https://github.com/huggingface/transformers/issues/7574 does it help? — A. Maman, May 04 '21 at 06:06
Thanks, @A.Maman. But this links does not convince me what is exactly happening! — K.N, May 08 '21 at 12:59

score 1 · Answer 1 · answered May 12 '21 at 21:43

1

The masked_bias was added but the huggingface community as a speed improvement compared to the original implementation. It should not negatively impact the performance as the original weights are loaded properly. Check this PR for further information.

answered May 12 '21 at 21:43

cronoik

15,434
3
40
78

Why some weights of GPT2Model are not initialized?

1 Answers1