I am trying to finetune the chatglm-6b model using LoRA with transformers and peft in Kaggle GPUs (2*T4). The model structure:
The traditional loading method (AutoModel.from_pretrained) needs to load the model itself (15 GB) onto CPU first, whereas the CPU memory in Kaggle is 13 GB and the model cannot be loaded.
Thus, I used load_checkpoint_and_dispatch() function of Accelerate to load the model:
from transformers import AutoTokenizer, AutoModel, AutoConfig
from accelerate import load_checkpoint_and_dispatch, init_empty_weights
from huggingface_hub import snapshot_download
FilePath = snapshot_download(repo_id='THUDM/chatglm-6b')
config = AutoConfig.from_pretrained(FilePath, load_in_8bit=True, trust_remote_code=True)
with init_empty_weights():
model = AutoModel.from_config(config, trust_remote_code=True).half()
model = load_checkpoint_and_dispatch(
model, FilePath, device_map='auto', no_split_module_classes=["GLMBlock"]
)
With this method the model can be loaded successfully in both CPU and GPUs.
Then the LoRA adapters were added with peft.
from peft import get_peft_model, LoraConfig, TaskType
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False, r=32, lora_alpha=32, lora_dropout=0.1, bias='none',
# ['dense','dense_h_to_4h','dense_4h_to_h'] # 'query_key_value',
target_modules=['query_key_value',],
)
model = get_peft_model(model, peft_config)
Now, the model can generate output directly with
outputs = model(**tokenizer(['Hello world!'], return_tensors='pt).to(model.device))
However, after using accelerator.prepare() to wrap the model, dataloader, etc, I got the RuntimeError.
accelerator = Accelerator()
train_dataloader, val_dataloader, model, optimizer = \
accelerator.prepare(train_dataloader, val_dataloader, model, optimizer)
train_loss = []
epoch_correct_num, epoch_total_num = 0, 0
model.train()
for batch in tqdm(train_dl):
labels = batch['labels']
outputs = model(**batch)
loss, logits = outputs.loss, outputs.logits
optim.zero_grad()
# loss.backward()
accelerator.backward(loss)
grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 2.0)
optim.step()
scheduler.step()
Are there some methods to solve this problem?