1

I have trained a language model using transformer-lm which uses PyTorch. I would like to deploy the resulting model to the Google Cloud Platform as a Cloud Function. Cloud Functions are limited to 2 GB of memory.

The problem is that loading the model leads to an error as too much memory is used (memory limit exceeded). The model.pt file is 1.32 GB, and I use

torch.load(model_path / 'model.pt', map_location='cpu')

to load the model. Is there a way to i) compress the model? ii) not load the full model at once? or any other possibility to make it run on GCP?

navige
  • 2,447
  • 3
  • 27
  • 53

1 Answers1

1

Cloud Functions run their own server instances when triggered. If a called function is not used after some time, that instance is terminated. However, if you call it again while the instance is still running, the same instance along with the elements in your execution environment will be used again. This can cause your function crash.

In order to avoid that, you might want to implement a Garbage Collector. In Python, you can use the gc module for that. In particular, you can try the function gc.collect() to clear the memory.

Deniss T.
  • 2,526
  • 9
  • 21
  • unfortunately, `gc.collect` did not help; the function crashes even with just one single run – navige Oct 16 '19 at 10:03
  • In this case, I'd recommend to test your code locally for the [memory leaks/spikes](https://medium.com/zendesk-engineering/hunting-for-memory-leaks-in-python-applications-6824d0518774). You might want to take a look into this [post](https://stackoverflow.com/questions/1435415/python-memory-leaks#answer-1435426) about Python memory leaks. – Deniss T. Oct 17 '19 at 19:11
  • 1
    Keep in mind that Cloud Functions are used to write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. For more complex functionality, you might need to use **Google App Engine** or **Google Compute Engine**. – Deniss T. Oct 17 '19 at 19:18