Questions tagged [peft]

16 questions
4
votes
1 answer

Target modules for applying PEFT / LoRA on different models

I am looking at a few different examples of using PEFT on different models. The LoraConfig object contains a target_modules array. In some examples, the target modules are ["query_key_value"], sometimes it is ["q", "v"], sometimes something else. I…
ahron
  • 803
  • 6
  • 29
4
votes
2 answers

How to load a fine-tuned peft/lora model based on llama with Huggingface transformers?

I've followed this tutorial (colab notebook) in order to finetune my model. Trying to load my locally saved model model = AutoModelForCausalLM.from_pretrained("finetuned_model") yields Killed. Trying to load model from hub: yields import…
Lucas Azevedo
  • 1,867
  • 22
  • 39
2
votes
1 answer

Further finetune a Peft/LoRA finetuned CausalLM Model

I am a bit unsure how to proceed regarding the mentioned topic. The baseline is a model created via Huggingface’s library as an AutoModelForCausalLM model, PEFT and a LoRA approach with subsequent merging of the weights. I now want to further fine…
1
vote
1 answer

Llama QLora error: Target modules ['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h'] not found in the base model

EDIT: solved by removing target_modules I tried to load Llama-2-7b-hf LLM with QLora with the following code: model_id = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True) # I have permissions. model…
Ofir
  • 590
  • 9
  • 19
1
vote
0 answers

Getting CUDA out of memory when calling save_pretrained in a script that tries lora training a large language model using huggingface

I am trying to train a LLama LLM ("eachadea/vicuna-13b-1.1") using LoRA on a LambdaLabs A100 40 GB. Everything seems to be working fine including the training, however the script fails on the last line:…
1
vote
1 answer

big_modeling.py not finding the offload_dir

I'm trying to load a large model on my local machine and trying to offload some of the compute to my CPU since my GPU isn't great (Macbook Air M2). Here's my code: from peft import PeftModel from transformers import AutoTokenizer, GPTJForCausalLM,…
Matthew Berman
  • 8,481
  • 10
  • 49
  • 98
0
votes
1 answer

How to directly load fine-tuned model like Alpaca-Lora (PeftModel()) from the local files instead of load it from huggingface models?

I have finetuned Llama model using low-rank adaptation (LoRA), based on peft package. The result files adapter_config.json and adapter_model.bin are saved. I can load fine-tuned model from huggingface by using the following codes: model =…
a7777777
  • 1
  • 1
0
votes
0 answers

Questions about distributed finetuning of transformers model (chatglm) with Accelerate in Kaggle GPUs

I am trying to finetune the chatglm-6b model using LoRA with transformers and peft in Kaggle GPUs (2*T4). The model structure: The traditional loading method (AutoModel.from_pretrained) needs to load the model itself (15 GB) onto CPU first, whereas…
0
votes
0 answers

How to load the finetuned model (merged weights) on colab?

I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it. base_model = AutoModelForCausalLM.from_pretrained( model_name, …
Gaurav Gupta
  • 4,586
  • 4
  • 39
  • 72
0
votes
0 answers

Combine base model with my Peft adapters to generate new model

I am trying to merge my fine-tuned adapters to the base model. With this torch.cuda.empty_cache() del model pre_trained_model_checkpoint = "databricks/dolly-v2-3b" trained_model_chekpoint_output_folder =…
Hanzo
  • 1,839
  • 4
  • 30
  • 51
0
votes
0 answers

Lora fine-tuning taking too long

Any reason why this is giving me a month of expected processing time? More importantly, how to speed this up? My dataset is a collection of 20k short sentences (max 100 words each). import transformers import torch model_id =…
Lucas Azevedo
  • 1,867
  • 22
  • 39
0
votes
0 answers

HuggingFace - Load/ save PeftConfig as json

I am training fine-tuning a HuggingFace model by adding my own data and using LORA. However, I do not want to upload the file to HuggingFace, but store it on my local computer. This works for the tokenizer and the model, however the LoraConfig…
Andi Maier
  • 914
  • 3
  • 9
  • 28
0
votes
0 answers

Error with get_peft_model() and PromptTuningConfig

I am learning how to perform Prompt Tunning and running into a problem. I am using get_peft_model function to initialize a model for training from 'google/flan-t5-base' model_name='google/flan-t5-base' tokenizer =…
0
votes
1 answer

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' - PEFT Huggingface trying to run on CPU

I am relatively new to LLMs, trying to catch up with it. Following an example I modified the code a bit, to make sure I am running the things locally on an EC2 instance. Training went OK on CPU only, (27 hours), saved model, tokenizer and configs to…
maop
  • 194
  • 14
0
votes
0 answers

How to improve the output of fine tuned Open Llama 7b model for text generation?

I am trying to fine tune a openllama model with huggingface's peft and lora. I fine tuned the model on a specific dataset. However, the output from the model.generate() is very poor for the given input. When I give a whole sentence form the dataset…
1
2