1

I am running a python program that processes a large dataset. Sometimes, it runs into a MemoryError when the machine runs out of memory.

I would like any MemoryError that is going to occur to happen at the start of execution, not in the middle. That is, the program should fail-fast: if the machine will not have enough memory to run to completion, the program should fail as soon as possible.

Is it possible for Python to pre-allocate space on the heap?

Owen
  • 38,836
  • 14
  • 95
  • 125
  • I don't think 64-bit python imposes an artificial heap limit like Java... Of course 32-bit python has an address space limit. How big is the data, how big is your RAM, and how much swap space have you allocated? – Owen Dec 30 '20 at 04:14
  • Hi Owen - Thanks for your suggestion on Swap memory. I have no Swap memory allocated. Direction I am tending towards is, check and fail fast if pre-conditions are not met for python interpreter to start. – Yogananth Mahalingam Dec 30 '20 at 05:26
  • Oh, I see, I misunderstood the original intent of your question. I don't know the answer. But be warned that even if Python allocates memory on the heap, [that doesn't necessarily mean the memory exists](https://stackoverflow.com/questions/1655650/linux-optimistic-malloc-will-new-always-throw-when-out-of-memory). – Owen Dec 30 '20 at 06:07
  • I made a significant edit to try to clarify what you are asking. If I got it wrong, please undo my edit. – Owen Dec 30 '20 at 06:12
  • Can you deduce required amount of RAM from size of your input? It must not be precise - some rough estimation would be OK for your purposes. I would do practical experiments, find some kind of pattern, than evaluate on start free RAM, exit if RAM is less that prognosed. – Alex Yu Dec 30 '20 at 06:25
  • If you can, a simple solution is perhaps just try to evaluate `byte([0])*N`... then delete it later. – user202729 Dec 30 '20 at 07:05
  • If python interpreter does not have such a support, then will explore other options for scheduling. Here are the candidates came to mind: 1. A simplisitic approach: Find and allocate the smallest VM from the pool for the required Hardware needs of the pipeline 2. Go for a Container (Kubernetes, Docker etc) based solution - where resources can be guaranteed without conflict with other running jobs. Thanks for all your expert advice, timely help and willingness to share your time in finding the solution. – Yogananth Mahalingam Dec 30 '20 at 07:20

1 Answers1

0

Is it possible to allocate heap memory at the start of python process,

Python uses as much memory as needed, so if your program is running out of memory, it would still run out of it even if there was a way to allocate the memory at the start.

One solution is trying to allow for swap to increase your total memory, although performance will be very bad in many scenarios.

The best solution, if possible, is to change the program to process data in chunks instead of loading it entirely.

Acorn
  • 24,970
  • 5
  • 40
  • 69
  • Thanks Acorn for helping with answer. I have a farm of servers and this farm is used by group of independent jobs. As free memory available varies from node to node, I want my pipeline to fail fast if the pre-requisites are not met - like memory needs. Hence looking for option to pre-allocate resources, so that scheduler can decide if the node is fully occupied or not. Does Python have support only for on-demand heap allocation? Is there any option to pre-allocate heap at all? – Yogananth Mahalingam Dec 30 '20 at 05:20