I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it.
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
model.save_pretrained(...path/to/model)
Now, I would like to the model at path/to/model using the following code
model_config = transformers.AutoConfig.from_pretrained(
model_id,
use_auth_token=hf_auth
)
model = transformers.AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
config=model_config,
device_map='auto',
offload_folder="offload",
torch_dtype=float16,
use_auth_token=hf_auth,
offload_state_dict = True,
)
model.eval()
My intent behind saving the merged model is to eliminate the dependency on base_model.
problem
While running the model in the colab, i see there is no GPU usage and CPU is being used only. This crashes the runtime. I would like to know what is causing GPU to not being used?