I am working on fine-tuning LLAMA2 with a dataset containing chat history. While preparing the data, I've noticed that the dialogue doesn't always follow a pattern of alternating responses between speakers. In some cases, one person responds several times before the other person replies.
Here's an example of what I mean:
- Person A: Hi!
- Person A: How are you?
- Person B: Hello, I'm fine. You?
- Person A: I'm good too. Thanks for asking!
- Person A: What are you up to?
My concern is whether this inconsistency in the pattern of responses might pose a problem when training LLAMA2. Specifically, I'm worried that the lack of a strict alternation might influence the model's ability to understand the context of the conversation.
Will this inconsistency have an effect on the performance or behavior of the model?
I'm using: https://github.com/georgesung/llm_qlora/tree/main
Sample dataset
[
{
"id": "AAaDWwu_0",
"conversations": [
{
"from": "human",
"value": "Hello there"
},
{
"from": "human",
"value": "are you there?"
},
{
"from": "gpt",
"value": "Oh hi!"
}
]
},
{
"id": "AAaDWwu_2",
"conversations": [
{
"from": "human",
"value": "Ola"
},
{
"from": "human",
"value": "Did you have a fun day?"
},
{
"from": "gpt",
"value": "Yes!"
}
]
}
]
yaml
model_name: open_llama_7b_qlora_uncensored
base_model: georgesung/llama2_7b_chat_uncensored
model_family: llama # if unspecified will use AutoModelForCausalLM/AutoTokenizer
model_context_window: 2048 # if unspecified will use tokenizer.model_max_length
data:
type: vicuna
dataset: xxx/yyy # HuggingFace hub
user_header: "### HUMAN:\n"
response_header: "### RESPONSE:\n"
lora:
r: 8
lora_alpha: 32
target_modules: # modules for which to train lora adapters
- q_proj
- k_proj
- v_proj
lora_dropout: 0.05
bias: none
task_type: CAUSAL_LM
trainer:
batch_size: 1
gradient_accumulation_steps: 4
warmup_steps: 100
num_train_epochs: 5
learning_rate: 0.0002 # 2e-4
logging_steps: 20
trainer_output_dir: trainer_outputs/
model_output_dir: models/ # model saved in {model_output_dir}/{model_name}