Fine-tuning a pre-trained LLM for question-answering

Question

Objective

My goal is to fine-tune a pre-trained LLM on a dataset about Manchester United's (MU's) 2021/22 season (they had a poor season). I want to be able to prompt the fine-tuned model with questions such as "How can MU improve?", or "What are MU's biggest weaknesses?". The ideal responses would be insightful/logical and +100 words

Data

I will simply use text from the relevant wiki page as my data: https://en.wikipedia.org/wiki/2021%E2%80%9322_Manchester_United_F.C._season
How should I structure my data? Should it be a list dictionaries where the keys are the questions and the values are the answers (i.e. a list of question-answer pairs), or a long string containing all the text data (for context), or a combination of both?

Notes

I have mainly been experimenting with variations of Google's T5 (e.g.: https://huggingface.co/t5-base) which I have imported from the Hugging Face Transformers library
So far I have only fine-tuned the model on a list of 30 dictionaries (question-answer pairs), e.g.: {"question": "How could Manchester United improve their consistency in the Premier League next season?", "answer": " To improve consistency, Manchester United could focus on strengthening their squad depth to cope with injuries and fatigue throughout the season. Tactical adjustments could also be explored to deal with teams of different strengths and styles."}
Use of this small dataset (list of 30 dictionaries) has given poor results

Further Questions and Notes

Other than increasing the size of my dataset, is my approach sound?
What would you recommend as a minimum number of dictionaries to train/fine-tune the model on?
I am also aware that I can tune the hyperparameters to improve performance, but for now I am more concerned about my general approach being logical

score 2 · Answer 1 · answered Jun 03 '23 at 20:28

2

You can try to see how far you can get with LLMs and prompting (e.g., use Alpaca-LoRA or libraries like LangChain and FastChat).

However, if you want to persist with an approach similar to your current one, given the limited data you have, I would highly recommend considering a zero-shot approach. This means you must fine-tune your T5 model on a large Q&A dataset that is unrelated to your problem domain, and then test it on your current annotated data. If you are satisfied with the model's performance, you can stop there.

You can refer to my paper To tune or not to tune? Zero-shot models for legal case entailment, where I deal with a very similar problem. The conclusion of the paper is that if you don't have enough data for fine-tuning, it is sometimes better to simply forgo the target domain and fine-tune your models on a well-established dataset, even if it may be on a completely different subject.

As for how you should structure your test data, I can't provide a specific answer because it's highly dependent on what is happening in your code. It's difficult to prescribe what kind of preprocessing should be done in a high-level discussion like this.

answered Jun 03 '23 at 20:28

Ruan

772
4
13

Hi Ruan, thank you very much for your response - the zero-shot approach sounds very interesting. My main concern is if my problem is text generation or question-answering? – Tom Bomer Jun 06 '23 at 14:23
Both. Your problem is called **generative** question answering (Generative QA). [Read more here](https://huggingface.co/tasks/question-answering) under "Task Variants". – Ruan Jun 06 '23 at 14:40
aah, thank you! – Tom Bomer Jun 06 '23 at 14:55
LangChain seems like a good option, but to use Google's PaLM or Anthropic's Claude you need to join the waitlists. And to use an OpenAI model you need to pay for the API, do you know if LangChain offer models that are available/free? – Tom Bomer Jun 08 '23 at 10:59
Langchain supports the [Hugging Face Hub API](https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_hub.html) and the [Hugging Face Pipeline](https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_pipelines.html). Both are free. – Ruan Jun 09 '23 at 10:25

Fine-tuning a pre-trained LLM for question-answering

Objective

Data

Notes

Further Questions and Notes

1 Answers1

Linked