I have trained a language model using transformer-lm which uses PyTorch. I would like to deploy the resulting model to the Google Cloud Platform as a Cloud Function. Cloud Functions are limited to 2 GB of memory.
The problem is that loading the model leads to an error as too much memory is used (memory limit exceeded). The model.pt file is 1.32 GB, and I use
torch.load(model_path / 'model.pt', map_location='cpu')
to load the model. Is there a way to i) compress the model? ii) not load the full model at once? or any other possibility to make it run on GCP?