Finetuning Open LLMs

Question

I am a newbie trying to learn fine tuning. Started with falcon 7B instruct LLM as my base LLM and want to fine tune this with open assistant instruct dataset. I have 2080 Ti with 11G VRAM. So I am using 4 bit quantization and Lora.

These are the experiments I did so far:

1> I trained with SFT trainer from hugging face for 25000 epochs, the loss decreased from 1.8 to 0.7. Below is the entire code I am using for training.

import torch, einops
from datasets import load_dataset
from peft import LoraConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    AutoTokenizer,
    TrainingArguments
)
from peft.tuners.lora import LoraLayer

from trl import SFTTrainer


def create_and_prepare_model():
    compute_dtype = getattr(torch, "float16")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=True,
    )

    model = AutoModelForCausalLM.from_pretrained(
        "tiiuae/falcon-7b-instruct", quantization_config=bnb_config, device_map={"": 0}, trust_remote_code=True
    )

    peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=[
            "query_key_value"
        ],
    )

    tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=True)
    tokenizer.pad_token = tokenizer.eos_token

    return model, peft_config, tokenizer


training_arguments = TrainingArguments(
    output_dir="./results_falcon-7b-instruct-new",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=10,
    optim="paged_adamw_32bit",
    save_steps=5,
    logging_steps=10,
    learning_rate=2e-4,
    fp16=True,
    max_grad_norm=0.3,
    max_steps=20,
    warmup_ratio=0.03,
    # group_by_length=True,
    lr_scheduler_type="constant",
)

model, peft_config, tokenizer = create_and_prepare_model()
model.config.use_cache = False
dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=True,
)

trainer.train()
trainer.save_model("falcon-instruct-7b-4bit-openassist-latest-new")
model.config.to_json_file("falcon-instruct-7b-4bit-openassist-latest-new/config.json")

took about 53 hours. But the model just spits out gibberish when asked for a simple question like "how are you?"

2> 300 epochs, loss went down from 1.8 to 1.5 but the model still spits out gibberish.

3> 40 epochs, loss went down from 1.8 to 1.7 but the model still spits out gibberish.

Any pointers that could give me a head start? Please suggest. Any open source code to do something similar will be greatly appreciated. Thanks a lot.

JCR000 · Answer 1 · 2023-06-27T02:24:54.280

1) Match you prompts to the dataset

Is what you are entering into the generation prompt look like what is being fine-tuned with?

Generally the LLM will generate the desired output when you use the same format the fine-tuning dataset is formatted in. This formatting helps "steer" or "contextualize" the generation text.

The Alpaca datasets generally follow the following format:

### Instruction:
(Instruction Text)

### Input:
(Auxiliary Input Text)

### Response:
(Desired Response Text)

Vicuna datasets generally follow the following format:

A chat between a human and an assistant.

### Human:
(Question Text)
### Assistant:
(Response Text)

Another format formally described recently in the Microsoft Orca paper:

<System>: (You are a helpful <role> ai assistant)
<Instruction>: (Instruction Text Goes Here)
<Input>: (Other input goes here)
<Response>: (The desired response goes here)

Be mindful of line breaks ( also end of text symbols if the LLM model has any in pretraining ) in your dataset and prompt. Vicuna inference prompt e.g.

### Human:
What shape is the Earth?
### Assistant:

If you are using transformers directly in python to perform inference you will have to add the "### Assistant:\n" line to the end of you prompt minding the how line breaks '\n' are handled in the dataset. LLM's are glorious auto-completes if not quite a stochastic parrots.

The Vicuna format excels for chatbot fine-tuning prompts. The Alpaca and Orca format are formats are useful for instruction following models trained to provide information in a specific format. This topic is evolving and in practice the nitty gritty of prompt engineering most users never see or think about. That all said these formats are not magic just one part of a solution to generating interpretable responses aligned with intents that merits strict attention.

2) When you have prompt and dataset formats all accounted for, go back to hyper-parameter optimization.

Create a training data set of 1000 items pulled at random from your cornerstone dataset and an evaluation of 300 random samples.
Use the transformers supported hyperparameter optimization methods
- optuna, sigopt, raytune, wandb
- More from Huggingface

3) Text Generation after 20 Training Steps

The following example works though throws a warning as the Falcon model is not well integrated into huggingface transformers library.

from transformers import pipeline
prompt = """### Human: Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research. ### Assistant:"""
    
    
pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
            device_map="auto",
    )
        
sequences = pipe(
            prompt,
            max_length=100,
            do_sample=True,
            top_k=10,
            num_return_sequences=1,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.eos_token_id
    )

for seq in sequences:
    print(seq['generated_text'])
        
>>>
### Human: 
Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.

### Assistant:
In economics, a monopsony is a market position where a single entity has enough market power to exercise price-setting and product differentiation strategies. In particular, a labour market monopsony occurs when a single employer has the ability

Finetuning Open LLMs

1 Answers1