How to compare how similar two images are in Python?

Question

I am trying to compare an image I am taking to an image I already have stored on my computer and return True if they are similar enough. Here is a question that is similar to this.

I am using OpenCV, so using that would be good. My current work around is to use OpenCV to first open the images, then gray scale the images, then blur them, then write them back to files. Then I use Image from PIL and imagehash to compare the hashes of the images before deleting the files. Is there a better way of doing this?

Here's my current code:

def compareImg(imgWarpColor):
    img = cv2.imread("data.jpg")
    img = cv2.resize(img, (660, 880))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    cv2.imwrite("datagray.jpg", img)

    grayImgWarped = cv2.cvtColor(imgWarpColor, cv2.COLOR_BGR2GRAY)
    blurImg = cv2.GaussianBlur(grayImgWarped, (3, 3), 0)
    cv2.imwrite("blurredImage.jpg", blurImg)

    hash0 = imagehash.average_hash(Image.open('blurredImage.jpg'))
    hash1 = imagehash.average_hash(Image.open('datagray.jpeg'))
    cutoff = 5

    hashDiff = hash0 - hash1
    print(hashDiff)
    if hashDiff < cutoff:
        print('These images are similar!')

    filepath = 'C:\Users\MY NAME\PycharmProjects\projectInQuestion'
    
    os.remove(filepath, 'blurredImage.jpg')
    os.remove(filepath, 'datagray.jpg')

A minor note regarding performance. I assume you want the blurred images you're comparing written to file so that is ok; however, you shouldn't need to reread the image files back in when computing their `average_hash` value. Just use `img` and `blurImg` directly instead. — frederick-douglas-pearce, Jul 04 '22 at 02:25
@frederick-douglas-pearce Just tried that and got this error. I'm not sure if OpenCV and imagehash are compatible but I could be wrong ``` in average_hash image = image.convert("L").resize((hash_size, hash_size), Image.ANTIALIAS) AttributeError: 'numpy.ndarray' object has no attribute 'convert' ``` — xkycc, Jul 04 '22 at 02:30
stick with PIL entirely. no need for OpenCV here at all. -- if you must, at least DO NOT write those temp images to files. it's trivial to convert between PIL Image and numpy array... -- also, the strings in your code that contain paths... do you know what a backslash does in a string, in general? — Christoph Rackwitz, Jul 04 '22 at 08:39
@Nolan, you would need to convert the image from OpenCV format to PIL format, e.g. https://www.geeksforgeeks.org/convert-opencv-image-to-pil-image-in-python/. The two have different channel ordering (BGR vs RGB), then convert the numpy array using `Image.fromarray`. I agree with @Christoph Rackwitz though, in that it would be better to just use the PIL library if possible. My point was you're wasting resources on I/O by reading the images from file, writing them to file, then reading them from file again. Only read each image in once, then it is in memory and you can do what you need to do. — frederick-douglas-pearce, Jul 04 '22 at 23:58

score 1 · Accepted Answer · answered Jul 13 '22 at 13:33

This was a quickly coded mock-up, and is by no means an efficient way of doing this. As pointed out by frederick-douglas-pearce, in order to work with OpenCV and PIL, you need to make sure that the images are formatted the same.

OpenCV stores images as BGR while PIL stores images as RGB. You can convert an OpenCV image to a PIL image by doing the following:

pilImg = cv2.cvtColor(openCVImg, cv2.COLOR_BGR2RGB)

If you are interested in doing something similar to what my original code did, this would be a better way of doing it:

def compareImages(cv2Img):
    # Convert cv2Img from OpenCV format to PIL format
    pilImg = cv2.cvtColor(cv2Img, cv2.COLOR_BGR2RGB)
    
    # Get the average hashes of both images
    hash0 = imagehash.average_hash(pilImg)
    hash1 = imagehash.average_hash(Image.open('toBeCompared.jpeg'))
    cutoff = 5  # Can be changed according to what works best for your images
    
    hashDiff = hash0 - hash1  # Finds the distance between the hashes of images
    if hashDiff < cutoff:
        print('These images are similar!')

Note that I originally was blurring and gray-scaling images; this is not required as hashing algorithms already do something similar. Also, as pointed out in the comments of this post, writing files only to delete them is highly inefficient, so try to stay away from that if possible.

If average_hash() isn't working as expected, consider using whash(), phash(), or dhash() or some combination of the four which can also be found in the ImageHash library.

score 0 · Answer 2 · answered Jul 04 '22 at 20:49

0

I had great success doing this with pyautogui, but it is like live image matching and for automating SIMPLE GUI tasks. idk if it helps, maybe some similar library dependency in there>

answered Jul 04 '22 at 20:49

Ilya Nazarchuk

81
6

How to compare how similar two images are in Python?

2 Answers2