0

Short context: Our application has a backend written in Python. It contains a couple of rest API endpoints and Message Queue handling (rabbitMQ and Pika). The reason why we use Python is that it is a Data Science/AI project - so lot of data processing require some DS knowledge.

Problem: Some parts of our application have CPU heavy calculations, and we are using multiprocessing to add parallelization. However, we need to be careful because each process it's starting new interpreter and imports everything again. The environment is windows, so the process creation is a "spawn".

Question: Is there the best way how to maintain this? The team is big, so there is a chance that someone will put some big object creation or long processing function that will start on application boot that might be called and kept in memory while creating a pool of processes.

chmtomasz
  • 25
  • 3
  • Perhaps relevant? : https://stackoverflow.com/questions/48680134/how-to-avoid-double-imports-with-the-python-multiprocessing-module/72802152#72802152. You'll just have to be careful when using multiprocessing – Charchit Agarwal Jul 09 '22 at 13:48
  • Keep the "main" file as simple as possible as anything imported in "main" will also be imported by the children (in addition to everything necessary to call the target function). That said, it is not common practice to use delayed imports with python, so probably most things will get imported anyway.. All that said, you should worry more about reducing global state, as data is frequently much larger than code anyway. I frequently put read-only numpy arrays in `mp.shared_memory` to eliminate copying. – Aaron Jul 09 '22 at 21:24

0 Answers0