I know the formal way of training a GPT2 model on custom documents is to first do semi-supervised fine tuning on the text of the documents followed by supervised fine-tuning on question answers from the same documents. But the sole purpose of supervised fine-tuning being to acquire style of answering question, is it possible to do supervised fine-tuning on a general dataset, and after that perform unsupervised fine-tuning on our custom text dataset from documents. This way question answering style can also be acquired by the model along with the advantage of having no need of making a question-answer dataset for the custom documents.
Will it give the desired results?