I'm trying to load a large model on my local machine and trying to offload some of the compute to my CPU since my GPU isn't great (Macbook Air M2). Here's my code:
from peft import PeftModel
from transformers import AutoTokenizer, GPTJForCausalLM, GenerationConfig
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
offload_folder="/Users/matthewberman/Desktop/offload"
model = GPTJForCausalLM.from_pretrained(
"EleutherAI/gpt-j-6B",
device_map="auto",
offload_folder=offload_folder,
quantization_config=quantization_config
)
model = PeftModel.from_pretrained(model, "samwit/dolly-lora", offload_dir=offload_folder)
However, I get this error:
ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.transformer.h.10, base_model.model.transformer.h.11, base_model.model.transformer.h.12, base_model.model.transformer.h.13, base_model.model.transformer.h.14, base_model.model.transformer.h.15, base_model.model.transformer.h.16, base_model.model.transformer.h.17, base_model.model.transformer.h.18, base_model.model.transformer.h.19, base_model.model.transformer.h.20, base_model.model.transformer.h.21, base_model.model.transformer.h.22, base_model.model.transformer.h.23, base_model.model.transformer.h.24, base_model.model.transformer.h.25, base_model.model.transformer.h.26, base_model.model.transformer.h.27, base_model.model.transformer.ln_f, base_model.model.lm_head.
I am definitely pointing to a valid offload directory as the previous method uses offload_folder successfully (I see things being put in there).
What am I doing wrong?