1

I would like to make use of Gunicorn's preload feature to save memory while running paddlepaddle gpu for inference purposes. When using preload, it seems like CUDA cannot be initialised properly, as CUDA is initialised as soon as paddle is imported.

From paddle github repository :

initialization operation of CUDA occurs before the process is forked, causing the newly started process to not obtain the result of CUDA initialization, thus causing a crash during prediction.

They recommended using Flask as the server, but I would like to make use of the preload feature in gunicorn.

Was wondering if there is any workaround to this problem, thanks!

davidism
  • 121,510
  • 29
  • 395
  • 339
tofustack9
  • 21
  • 3

0 Answers0