1

I have the following code to create thumbnails and save images. However, after about 1000 items it raises an error saying too many open files. Where is this coming from? And how would I fix the code?

def download_file(url, extension='jpg'):
    """ Download a large file.  Return path to saved file.
    """
    req = requests.get(url)
    if not req.ok:
        return None

    guid = str(uuid.uuid4())
    tmp_filename = '/tmp/%s.%s' % (guid, extension)
    with open(tmp_filename, 'w') as f:
        for chunk in req.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
                f.flush()
    return tmp_filename


def update_artwork_item(item):

    # Download the file
    tmp_filename = util.download_file(item.artwork_url)

    # Create thumbs
    THUMB_SIZES = [(1000, 120), (1000, 30)]
    guid = str(uuid.uuid4())
    S3_BASE_URL = 'https://s3-us-west-1.amazonaws.com/xxx/'

    try:

        for size in THUMB_SIZES:
            outfile = '%s_%s.jpg' % (guid, size[1])
            img = Image.open(tmp_filename).convert('RGB')
            img.thumbnail(size, Image.ANTIALIAS)
            img.save(outfile, "JPEG")
            s3_cmd = '%s %s premiere-avails --norr --public' % (S3_CMD, outfile) ## doesn't work half the time
            x = subprocess.check_call(shlex.split(s3_cmd))
            if x: raise
            subprocess.call(['rm', outfile], stdout=FNULL, stderr=subprocess.STDOUT)

    except Exception, e:

        print '&&&&&&&&&&', Exception, e

    else:
        # Save the artwork icons
        item.artwork_120 = S3_BASE_URL + guid + '_120.jpg'
        item.artwork_30 = S3_BASE_URL + guid + '_30.jpg'

        # hack to fix parallel saving
        while True:
            try:
                item.save()
            except Exception, e:
                print '******************', Exception, e
                time.sleep(random.random()*1e-1)
                continue
            else:
                subprocess.call(['rm', tmp_filename], stdout=FNULL, stderr=subprocess.STDOUT)
                break
Ken Kinder
  • 12,654
  • 6
  • 50
  • 70
David542
  • 104,438
  • 178
  • 489
  • 842
  • Have you tried adding img.close() after subprocess.call()? – Ciprum Apr 11 '16 at 18:34
  • There looks to be a couple locations that could be causing the issue: 1) I would suggest checking `util.download_file` and ensure it is closing your http(s) requests. 2) Ensure `img.save(outfile, "JPEG")` closes the file. – Cory Shay Apr 11 '16 at 18:35
  • @CoryShay please see updated question with the `download_file()` function in there now. – David542 Apr 11 '16 at 18:37
  • This may help (maybe dupe?): http://stackoverflow.com/questions/2023608/check-what-files-are-open-in-python – Claudiu Apr 11 '16 at 18:38
  • @David452 I would suggest seeing what files are open through @Claudiu's comment and also adding `req.close()` to `download_file` before returning `tmp_filename`. – Cory Shay Apr 11 '16 at 18:43
  • Instead of _shelling out_, use `shutil` and `os` modules instead. They will properly take care of lower level `subprocess` stuff if they need to. – C Panda Apr 11 '16 at 18:47
  • @CPanda: I think in general it's recommended to do `subprocess` instead of `os.system` (if that's what you had in mind with `os` module), from the docs: "The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using [os.system()]. " – Claudiu Apr 11 '16 at 19:21
  • 1
    @Claudiu No, I meant filesystem utilities, like `os.unlink` etc. – C Panda Apr 11 '16 at 19:29

1 Answers1

0

It's almost certainly your use of subprocess.call. subprocess.call is asynchronous, and returns a pipe object, which you are responsible for closing. (See the documentation). So what's happening is that each time you call subprocess.call, a new pipe object is being returned, and you eventually run out of file handles.

By far the easiest thing to do would be to just remove the file from Python by calling os.remove instead of piping to the Unix rm command. Your use of check_call is okay, because check_call is synchronous and won't return a file object you have to close.

Ken Kinder
  • 12,654
  • 6
  • 50
  • 70