Python IOError cannot allocate memory although there is plenty

Question

I've written a basic program to check through a directory tree containing many jpeg files (500000+) verify that they are not corrupted (approximately 3-5% of the files seem to be corrupt in some way) and then take a sha1sum of the files (even the corrupt ones) and save the info into a database.

The jpeg files in question are located on a windows system and mounted on the linux box via cifs. They are mostly around 4 megabytes in size, although some maybe slightly larger or smaller.

When I run the program it seems to work fairly well for a while and then it falls over with the below error. This was after it had processed approximately 1100 files (the error indicated that the problem occurred when attempting to open a file of 4.5 meg).

Now I understand that I can catch this error and continue or retry etc but I'm curious as to why it is occurring in the first place and if catching and retrying is actually going to solve the problem - or will it just get stuck retrying (unless I limit the retries of course but then a file is being skipped).

I'm using "Python 2.7.5+" on a debian system to run this. The system has at least 4 Gig (possibly 8) of ram and top is reporting that the script is using less than 1% of the ram and less than 3% of the cpu at any time when it is running. Similarly jpeginfo which this script runs is also using equally small amounts of memory and cpu.

To avoid using too much memory when reading files in I have taken the approach given in this answer to another question: https://stackoverflow.com/a/1131255/289545

Also you may note that the "jpeginfo" command is in a while loop looking for an "[OK]" response. This is because if "jpeginfo" thinks it can't find the file it returns a 0 and so it is not considered an error state by the subprocess.check_output call.

I did wonder if the fact that jpeginfo seems to fail to find certain files on the first try could be related (and I suspect it is) but the error returned says cannot allocate memory rather than file not found.

The Error:

Traceback (most recent call last):
  File "/home/m3z/jpeg_tester", line 95, in <module>
    main()
  File "/home/m3z/jpeg_tester", line 32, in __init__
    self.recurse(self.args.dir, self.scan)
  File "/home/m3z/jpeg_tester", line 87, in recurse
    cmd(os.path.join(root, name))
  File "/home/m3z/jpeg_tester", line 69, in scan
    with open(filepath) as f:
IOError: [Errno 12] Cannot allocate memory: '/path/to/file name.jpg'

The full program code:

  1 #!/usr/bin/env python
  2
  3 import os
  4 import time
  5 import subprocess
  6 import argparse
  7 import hashlib
  8 import oursql as sql
  9
 10
 11
 12 class main:
 13     def __init__(self):
 14         parser = argparse.ArgumentParser(description='Check jpeg files in a given directory for errors')
 15         parser.add_argument('dir',action='store', help="absolute path to the directory to check")
 16         parser.add_argument('-r, --recurse', dest="recurse", action='store_true', help="should we check subdirectories")
 17         parser.add_argument('-s, --scan', dest="scan", action='store_true', help="initiate scan?")
 18         parser.add_argument('-i, --index', dest="index", action='store_true', help="should we index the files?")
 19
 20         self.args = parser.parse_args()
 21         self.results = []
 22
 23         if not self.args.dir.startswith("/"):
 24                 print "dir must be absolute"
 25                 quit()
 26
 27         if self.args.index:
 28                 self.db = sql.connect(host="localhost",user="...",passwd="...",db="fileindex")
 29                 self.cursor = self.db.cursor()
 30
 31         if self.args.recurse:
 32                 self.recurse(self.args.dir, self.scan)
 33         else:
 34                 self.scan(self.args.dir)
 35
 36         if self.db:
 37                 self.db.close()
 38
 39         for line in self.results:
 40                 print line
 41
 42
 43
 44     def scan(self, dirpath):
 45         print "Scanning %s" % (dirpath)
 46         filelist = os.listdir(dirpath)
 47         filelist.sort()
 48         total = len(filelist)
 49         index = 0
 50         for filen in filelist:
 51                 if filen.lower().endswith(".jpg") or filen.lower().endswith(".jpeg"):
 52                         filepath = os.path.join(dirpath, filen)
 53                         index = index+1
 54                         if self.args.scan:
 55                                 try:
 56                                         procresult = subprocess.check_output(['jpeginfo','-c',filepath]).strip()
 57                                         while "[OK]" not in procresult:
 58                                                 time.sleep(0.5)
 59                                                 print "\tRetrying %s" % (filepath)
 60                                                 procresult = subprocess.check_output(['jpeginfo','-c',filepath]).strip()
 61                                         print "%s/%s: %s" % ('{:>5}'.format(str(index)),total,procresult)
 62                                 except subprocess.CalledProcessError, e:
 63                                         os.renames(filepath, os.path.join(dirpath, "dodgy",filen))
 64                                         filepath = os.path.join(dirpath, "dodgy", filen)
 65                                         self.results.append("Trouble with: %s" % (filepath))
 66                                         print "%s/%s: %s" % ('{:>5}'.format(str(index)),total,e.output.strip())
 67                         if self.args.index:
 68                                 sha1 = hashlib.sha1()
 69                                 with open(filepath) as f:
 70                                         while True:
 71                                                 data = f.read(8192)
 72                                                 if not data:
 73                                                         break
 74                                                 sha1.update(data)
 75                                 sqlcmd = ("INSERT INTO `index` (`sha1`,`path`,`filename`) VALUES (?, ?, ?);", (buffer(sha1.digest()), dirpath, filen))
 76                                 self.cursor.execute(*sqlcmd)
 77
 78
 79     def recurse(self, dirpath, cmd, on_files=False):
 80         for root, dirs, files in os.walk(dirpath):
 81             if on_files:
 82                 for name in files:
 83                     cmd(os.path.join(root, name))
 84             else:
 85                 cmd(root)
 86                 for name in dirs:
 87                     cmd(os.path.join(root, name))
 88
 89
 90
 91
 92
 93
 94 if __name__ == "__main__":
 95     main()

Your program still has a lot of memory, but it may have run out of other resources. Maybe file descriptors? Do you still get the exception if you comment out the subprocess calls? — Christian Hudon, Jul 04 '13 at 13:49
Don't you need to close the file for `with open(filepath) as f` with `f.close()`? Pardon me as I am new to python as well. — shahkalpesh, Jul 04 '13 at 13:58
@shahkalpesh: no, the `with` takes care of that as soon as you leave the block. — RickyA, Jul 04 '13 at 14:05

Cartroo · Accepted Answer · 2013-07-04T15:33:01.407

It looks to me like Python is just passing on an error from the underlying open() call and the real culprit here is the Linux CIFS support - I doubt Python would be synthesizing ENOMEM unless system memory was truly exhausted (and probably even then I'd expect the Linux OOM killer to be invoked instead of getting ENOMEM).

Unfortunately it might need something of a Linux filesystem expert to figure out what's going on there, but looking at the sources for CIFS in the Linux kernel, I can see a variety of places where ENOMEM is returned when various kernel-specific resources are exhausted as opposed to total system memory, but I'm not familiar enough with it to say how likely any of them are.

To rule out anything Python-specific you can run the process under strace so you can see the exact return code that Python is getting from Linux. To do this, run your command something like this:

strace -eopen -f python myscript.py myarg1 myarg2 2>strace.log

The -f will follow child processes (i.e. the jpeginfo commands that you run) and the -eopen will only show you open() calls as opposed to all system calls (which is what strace does by default). This could generate a reasonable amount of output, which is why I've redirected it to a file in the above example, but you can leave it displaying on your terminal if you prefer.

I would expect you'd see something like this just before you get your exception:

open("/path/to/file name.jpg", O_RDONLY) = -1 ENOMEM (Cannot allocate memory)

If so, this error is coming straight from the filesystem open() call and there's very little you can do about it in your Python script. You could catch the exception and retry (perhaps after a short delay) as you're already doing if jpeginfo fails, but it's hard to say how successful this strategy will be without knowing what's causing the errors in the first place.

You could, of course, copy the files locally, but it sounds like that would be a serious pain as there are so many.

EDIT: As an aside, you'll expect to see lots of open() calls which are nothing to do with your script because strace is tracing every call made by Python, which includes it opening its own .py and .pyc files, for example. Just ignore the ones which don't refer to the files you're interested in.

I've done as you suggest but the strace log file is massive and i haven't had a chance to go through it yet. I've also rewritten my program to retry several times with a delay. The problems seem to resolve themselves after a half second delay. Thanks — m3z, Jul 05 '13 at 20:15
Having looked throught the strace, it was the filesystem returning ENOMEM as you predicted. Thanks — m3z, Jul 07 '13 at 11:04
You're very welcome - sorry it's not a problem with an easy solution! If you can't solve the problem with CIFS then perhaps you could copy the files over to the Linux machine one at a time to be checked, or possibly in batches, but that's going to make things rather slower. — Cartroo, Jul 08 '13 at 10:41

Python IOError cannot allocate memory although there is plenty

1 Answers1

Linked