4

I am experimenting on the use of transformer embeddings in sentence classification tasks without finetuning them. I have used BERT embeddings and those experiments gave me very good results. Now I want to use GPT-2 embeddings (without fine-tuning). So I have two questions,

  1. Can I use GPT-2 embeddings like that (because I know Gpt-2 is trained on the left to right)
  2. Is there any example uses of GPT-2 in classification tasks other than generation tasks?
  3. If I can use GPT-2embeddings, how should I do it?
Guy Coder
  • 24,501
  • 8
  • 71
  • 136
Shamane Siriwardhana
  • 3,951
  • 6
  • 33
  • 73

1 Answers1

4

I basically solved the problem. Here I used embeddings extracted from GPT-2.

  1. So yes, we can use the final token of the GPT-2 embedding sequence as the class token. Because of the self-attention mechanism from left-to-right, the final token can represent the sequential information.

  2. Please check the following GitHub issue for an implementation that uses GPT-2 embeddings. github issue

  3. I conducted experiments comparing GPT-2 embedding with RoBERTa embedding. I got better results only with RoBERTa embedding and not GPT-2.

Shamane Siriwardhana
  • 3,951
  • 6
  • 33
  • 73