For fine-tuning of the large language models (llama2), what should be the format(.text/.json/.csv) and structure (like should be an excel or docs file or prompt and response or instruction and output) of the training dataset? And also how to prepare or organise the tabular dataset for training purpose?
I made a spreadsheet which contain around 2000 instruction and output pair and use meta-llama/Llama-2-13b-chat-hf model. But when start querying through the spreadsheet using the above model it gives wrong answers most of the time & also repeat it many times. So I want to know that what kind of docs format & it's structure i should try for fine-tuning the llama2.