2

I am a bit unsure how to proceed regarding the mentioned topic.

The baseline is a model created via Huggingface’s library as an AutoModelForCausalLM model, PEFT and a LoRA approach with subsequent merging of the weights.

I now want to further fine tune the model without losing its original properties - in this case via instruction fine tuning / prefix tuning.

My approach would be the following:

model = AutoModelForCausalLM.from_pretrained(
        model_id,
        use_cache=False if gradient_checkpointing else True
        device_map="auto",
        load_in_8bit=True,
    )

model = create_peft_config(model)

output_dir = "/tmp"
training_args = TrainingArguments(
        output_dir=output_dir,
        overwrite_output_dir=True,
        per_device_train_batch_size=per_device_train_batch_size,
        per_device_eval_batch_size=per_device_train_batch_size,
        bf16=bf16,
        learning_rate=lr,
        num_train_epochs=epochs,
        gradient_checkpointing=gradient_checkpointing,
        gradient_accumulation_steps=2,
        logging_dir=f"{output_dir}/logs",
        logging_strategy="steps",
        logging_steps=10,
        optim="adafactor",
        save_strategy="epoch",
        save_total_limit=3,
        evaluation_strategy="epoch",
        load_best_model_at_end=False,
        no_cuda=False,
        auto_find_batch_size=True
)

trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=dataset_train,
        compute_metrics=compute_metrics,
        preprocess_logits_for_metrics=preprocess_logits_for_metrics,
        eval_dataset=dataset_eval,
        data_collator=default_data_collator
)

trainer.train()

trainer.model.save_pretrained(output_dir)

del model
del trainer

peft_config = PeftConfig.from_pretrained(output_dir)
model = AutoModelForCausalLM.from_pretrained(
        peft_config.base_model_name_or_path,
        load_in_8bit=False,
        return_dict=True,
        device_map="auto",
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(
        model,
        output_dir,
        torch_dtype=torch.float16,
        device_map="auto",
)
model.eval()
os.makedirs("lora", exist_ok=True)

merged_model = model.merge_and_unload()
merged_model.save_pretrained('lora')

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.save_pretrained('lora')

In principle, I am loading the original model with the merged weights, finetune that on new data likewise with PEFT and LoRA and afterwards merging the weights again into the base model.

Is this a sensible approach, or is there something to suggest, for example, that I might even significantly compromise the original capabilities by doing so? If something speaks against it, what would be a better approach?

Kind regards and thanks in advance

After a training run for 3 epochs on about 15000 instruction pairs, the model is created correctly, the weights are applied and it can be loaded afterwards.

Unfortunately, you can clearly see that the model has lost its original capabilities. Prompts that worked before are dysfunctional. You can see that the model tries to approach the prompts correctly, but not qualitatively.

  • Your approach sounds fine, but keep in mind that neural networks start to [forget things](https://en.wikipedia.org/wiki/Catastrophic_interference). Why don't you train it on both datasets together? Also, you load it once in 8bit while the other time you don't load it in 8bit -> is that intended? You also pass something to the `bf16` parameter but later load it with fp16 -> make sure to not mess things up! – cronoik Apr 23 '23 at 20:39

1 Answers1

1

original documentation lacks how to load the model https://huggingface.co/blog/4bit-transformers-bitsandbytes

    trainer.model.save_pretrained('./bits')
    ...
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    #model_id = '/root/.cache/huggingface/hub/models--EleutherAI--gpt-neo-1.3B/snapshots/0f35a20339a9e208bc061570981180cfb87c9572'

    peft_config = PeftConfig.from_pretrained('bits')
    model = AutoModelForCausalLM.from_pretrained(
            peft_config.base_model_name_or_path,
            quantization_config=bnb_config, device_map={"":0}
            #load_in_8bit=False,
            #return_dict=True,
            #device_map="auto",
            #torch_dtype=torch.float16,
            #low_cpu_mem_usage=True,
    )

however, there are some steps here as well if you are using peftmodel https://github.com/huggingface/blog/blob/main/peft.md

    peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
    
      model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
    + model = get_peft_model(model, peft_config)
    + model.print_trainable_parameters()
thistleknot
  • 1,098
  • 16
  • 38