0

I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it.

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
model.save_pretrained(...path/to/model)

Now, I would like to the model at path/to/model using the following code


model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    device_map='auto',
    offload_folder="offload",
    torch_dtype=float16,
    use_auth_token=hf_auth,
    offload_state_dict = True,
)
model.eval()

My intent behind saving the merged model is to eliminate the dependency on base_model.

problem

While running the model in the colab, i see there is no GPU usage and CPU is being used only. This crashes the runtime. I would like to know what is causing GPU to not being used?

Gaurav Gupta
  • 4,586
  • 4
  • 39
  • 72

0 Answers0