2

I'm not going to post much code, and rather try to describe the issue I ran into. My program is taking a list of tuples in input:

input = [(a, b, c), (d, e, f, g)]

For each tuples in the list, a list of tuples is generated by my function f() and dumped into a pickle file. This list can be small (1 element) but also very large (thousands, millions of elements). The larger .pkl file I got is about 9 Gb.

This step can be sump up like this:

for elt in input:
    f(elt)

The function f() actually calls different method / objects / functions and can take quite a large amount of RAM. It turns out, that I ran out of RAM for just a few of the elements in input. I would like to skip them and to go to the next when it happens (to avoid a program crash and to do the computation for the others input after the one that can't be done).

i.e.:

for elt in input:
    try:
        f(elt)
    except:
        continue

My problem is that I read that memory error are quite nasty and can't always be recovered from.

What is the best way to implement this try / except safety? Is there a way to do this and purge the memory between the for loops iteration?

EDIT for clarification:

One more point. I ran the program on a PC with 128 Gb of RAM. However, I did not run it only once. I ran it once in one thread on input1, once in another thread on input2, and so on. I had 10 of them running. The combination of the threads loaded too much in memory at a given instant. However, by going to the next iteration on some of the threads (with the safety), this should be avoided.

The main asset of this method (instead of reducing the number of thread) is that if I ran the program on a laptop with 8 Gb of RAM, in only one thread, it is still going to work. If one of the iteration needs more than the available RAM (which will happen on laptops with 8 Gb of RAM), it just skips it and go to the next.

Method seems straight forward to me, but I don't know how to implement that on python since the recovering of the memory error is not secure.

Mathieu
  • 5,410
  • 6
  • 28
  • 55
  • 1
    Can you figure out which elements are problematic _before_ you run out of memory? – khelwood May 01 '18 at 08:13
  • @khelwood No, I have no clue about which one will produce a large file, and which one will produce a small file. – Mathieu May 01 '18 at 08:14
  • Sounds tricky. Can't you modify `f` to somehow look at the size of the items that it's processing so you can get a rough estimate of how big the final items will be? – PM 2Ring May 01 '18 at 08:25
  • 1
    BTW, using `input` as a variable name is a bad idea since it shadows the built-in `input` function, and adds unnecessary confusion to your code. – PM 2Ring May 01 '18 at 08:25
  • @PM2Ring I know, and it is not the name I use in my code. I used it here for the clarity of the question. And secondly, yes and no. Indeed my function f() could looks at the size of the list it generates and stop at some point, but I have no idea at which ponit it should stop, and it will depend of the other process aswell... – Mathieu May 01 '18 at 08:26
  • Are the elements created by `f` always the same type? If they are then you might be able to save a substantial amount of memory by storing in a more memory efficient sequencer than a list. – shuttle87 May 01 '18 at 08:26
  • @shuttle87 I know that my momery management system is not efficient, and that I have to change it at some point. But that would need a huge modification of thousand of lines.. I have no time to develop a new verison right now and I would like just to implement a safety that avoid the issue. – Mathieu May 01 '18 at 08:34
  • I think u should focus on reading on f why it is taking memory.If f is calling some multiple function which are written to take memory. then there is no way preventing this program to crash. So u should look at f why it is consuming RAM and improve it with checks so that it never consume memory.There is no try and except help otherwise – Ankit Vallecha May 01 '18 at 08:34
  • @AnkitVallecha So meaning that if it ran out of memory on one thread at a given instant, there is no way to say to that thread: "Hey, skip that iteration, and go to the next input". It seems weird to me. – Mathieu May 01 '18 at 08:38
  • 1
    "I used it here for the clarity of the question." I had the feeling that was the case, I was just pointing out that it has the opposite effect, since for most long-time Python coders the meaning of `input` is hard-wired into our brains. ;) – PM 2Ring May 01 '18 at 08:41
  • @PM2Ring I'll keep it in mind for other questions :) – Mathieu May 01 '18 at 08:43
  • I've just sen your edit. [This question](https://stackoverflow.com/questions/938733/total-memory-used-by-python-process) shows how you can find how much memory is being used, you can use that to decide whether or not it's safe to start a new thread. – PM 2Ring May 01 '18 at 08:45
  • What u can do is have check in ur program if memory consumed < something then proceed otherwise exit u can use psutil for memory used – Ankit Vallecha May 01 '18 at 08:50
  • @AnkitVallecha Ok yeah indeed seems that psutil could help. Only issue is that if I have an iteration starting with only 1 Gb left: maybe it's actually more than enough. Some of the iteration will use a few kB of RAM and some will use more than 20 Gb. But I have no clue on which one use how much... – Mathieu May 01 '18 at 09:03
  • @PM2Ring True, a safety level could be decide. However same issue as the one I describe in the comment above. Second, I didn't actually code the multithreading in the program. I'm just running several .py files of the same program with different inputs at the same time. i.e. no interaction between the threads, and I can't modify much on that part yet. – Mathieu May 01 '18 at 09:05
  • The last time I ran into issues with memory exhaustion I ended up writing this library: https://github.com/JaggedVerge/mmap_backed_array if you can provide some more information about exactly what's stored in the array it might be possible to suggest some alternative approaches for the memory management. In general I'd rather come up with a way to avoid a `MemoryError` than try to recover from one. – shuttle87 May 01 '18 at 11:48

0 Answers0