How to use image hash as filename for downloaded images?

Question

In Python I want to save an image to file. The filenames should be hashes, generated by imagehash.average_hash(). Using ls -l I see files but they are empty:

-rw-r--r--  1 lorem  lorem     0  8 Sep 16:20 c4c0bcb49890bcfc.jpg
-rwxr-xr-x  1 lorem  lorem   837  8 Sep 16:19 minimal.py

Code:

import requests
from PIL import Image
import imagehash
import shutil

def safe_to_file(url):
    headers = {
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
    image_hash = ''
    r = requests.get(url, headers=headers, timeout=10, stream=True)
    try:
        if r.status_code == 200:
            image_hash = str(imagehash.average_hash(Image.open(r.raw))) + '.jpg'
            print(image_hash)
            with open(image_hash, 'wb') as f:
                r.raw.decode_content = True
                shutil.copyfileobj(r.raw, f)
    except Exception as ex:
        print(str(ex))
    finally:
        return image_hash

# Random jpg picture
url = 'https://cdn.ebaumsworld.com/mediaFiles/picture/1035099/85708057.jpg'
safe_to_file(url)

I would expect images which aren't empty. What am I doing wrong?

If you're not getting any exceptions being printed, I would assume it's because `r.raw` is not a "file-like object" that `copyfileobj()` requires (or is empty if it is). — martineau, Sep 08 '19 at 02:51
It's also possible that the `Image.open(r.raw)` consumed all the image data so there's nothing left for `copyfileobj()` to read. Without and [mcve] it's hard to tell... — martineau, Sep 08 '19 at 03:00
@martineau: Of course that’s it: any image analysis will have to have read at least **some** of the stream, which is then unavailable to `copyfileobj`. — Davis Herring, Sep 08 '19 at 05:24
sorry, just added a minimum reproducible example. @martineau — user131366, Sep 08 '19 at 14:24

martineau · Accepted Answer · 2019-09-08T19:36:59.990

As I suspected, the creation of the PIL.Image object consumes and downloads all the image data from the url, so there's nothing for shutil.copyfileobj() to consume.

The code below seems to avoid that problem by explicitly saving the Image object with the desired hash-based filename. I added comments to indicate the significant changes.

import imagehash
from PIL import Image
import requests
#import shutil


def safe_to_file(url):
    headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) '
                             'AppleWebKit/537.36 (KHTML, like Gecko) '
                             'Chrome/53.0.2785.143 Safari/537.36'}
    image_hash = ''
    r = requests.get(url, headers=headers, timeout=10, stream=True)
    try:
        if r.status_code == 200:
            img = Image.open(r.raw)  # ADDED
            image_hash = str(imagehash.average_hash(img)) + '.jpg'  # CHANGED.
            print('saving image:', image_hash)
            img.save(image_hash)  # ADDED
#            with open(image_hash, 'wb') as f:  # REMOVED
#                r.raw.decode_content = True    # REMOVED
#                shutil.copyfileobj(r.raw, f)   # REMOVED
    except Exception as ex:
        print(str(ex))
    finally:
        return image_hash

# Random jpg picture
url = 'https://cdn.ebaumsworld.com/mediaFiles/picture/1035099/85708057.jpg'
safe_to_file(url)

c4c0bcb49890bcfc.jpg file it created:

How to use image hash as filename for downloaded images?

1 Answers1