Can we use GPT-2 sentence embedding for classification tasks?

Question

I am experimenting on the use of transformer embeddings in sentence classification tasks without finetuning them. I have used BERT embeddings and those experiments gave me very good results. Now I want to use GPT-2 embeddings (without fine-tuning). So I have two questions,

Can I use GPT-2 embeddings like that (because I know Gpt-2 is trained on the left to right)
Is there any example uses of GPT-2 in classification tasks other than generation tasks?
If I can use GPT-2embeddings, how should I do it?

Actually I am using Roberta. I tested the frozen Roberta embeddings for sentence sentiment classification task and it works fine even without fine tuning. — Shamane Siriwardhana, Mar 08 '20 at 10:53
I am using Roberta embeddings and sending the average of all those embeddings, via a classification head (a fully connected layer). — Shamane Siriwardhana, Mar 09 '20 at 01:26
Did you able to make any progress in this? I am also wondering if I can do similar text classification using GPT-3. — anveshtummala, Aug 09 '20 at 02:22

score 4 · Answer 1 · answered Oct 03 '20 at 03:04

I basically solved the problem. Here I used embeddings extracted from GPT-2.

So yes, we can use the final token of the GPT-2 embedding sequence as the class token. Because of the self-attention mechanism from left-to-right, the final token can represent the sequential information.
Please check the following GitHub issue for an implementation that uses GPT-2 embeddings. github issue
I conducted experiments comparing GPT-2 embedding with RoBERTa embedding. I got better results only with RoBERTa embedding and not GPT-2.

Can we use GPT-2 sentence embedding for classification tasks?

1 Answers1