How to load a pytorch model directly to the GPU

Question

I'm trying to load the whisper large v2 model into a GPU but in order to do that, it seems that pytorch unpickle the whole model using CPU's RAM using more than 10GB of memory, and then it load's it into the GPU memory.

Pytorch's torch.load documentation also says that

torch.load() uses Python’s unpickling facilities but treats storages, which underlie tensors, specially. They are first deserialized on the CPU and are then moved to the device they were saved from.

Which says that the model unpickling happens in the CPU.

Because this model will be running on the cloud, it seems wrong to pay for a VM with more RAM just so the model can be loaded into the CPU RAM to later be moved to the GPU. Is there a way to load it directly into the GPU without loading the whole model into the CPU's RAM?

I'm currently using whisper's load_model function which is basically doing this:

from whisper import Whisper, ModelDimensions

checkpoint_file = "large-v2.pt"
with open(checkpoint_file, "rb")) as fp:
  checkpoint = torch.load(fp, map_location="cuda")
del checkpoint_file

dims = ModelDimensions(**checkpoint["dims"])
model = Whisper(dims)
model.load_state_dict(checkpoint["model_state_dict"])
model.to("cuda")

How to load a pytorch model directly to the GPU

0 Answers0