36

I am running a python code to check similarity of Quora and Twitter users profiles photos, but i am not getting a positive result when images are the same.

This is the code for comparing the two images :

path_photo_quora= "/home/yousuf/Desktop/quora_photo.jpg"
path_photo_twitter="/home/yousuf/Desktop/twitter_photo.jpeg"
if open(path_photo_quora,"rb").read() == open(path_photo_twitter,"rb").read():
     print('photos profile are identical')

despite images are the same, the console is not printing "photos profile are identical", what can i do?

petezurich
  • 9,280
  • 9
  • 43
  • 57
Youcef
  • 1,103
  • 2
  • 11
  • 26
  • 1
    Possible duplicate of [Checking images for similarity with OpenCV](https://stackoverflow.com/questions/11541154/checking-images-for-similarity-with-opencv) – planetmaker Oct 10 '18 at 09:00

3 Answers3

63

You can use the imagehash library to compare similar images.

from PIL import Image
import imagehash
hash0 = imagehash.average_hash(Image.open('quora_photo.jpg')) 
hash1 = imagehash.average_hash(Image.open('twitter_photo.jpeg')) 
cutoff = 5  # maximum bits that could be different between the hashes. 

if hash0 - hash1 < cutoff:
  print('images are similar')
else:
  print('images are not similar')

Since the images are not exactly the same, there will be some differences, so therefore we use a cutoff value with an acceptable maximum difference. That difference between the hash objects is the number of bits that are flipped. But imagehash will work even if the images are resized, compressed, different file formats or with adjusted contrast or colors.

The hash (or fingerprint, really) is derived from a 8x8 monochrome thumbnail of the image. But even with such a reduced sample, the similarity comparisons give quite accurate results. Adjust the cutoff to find a balance between false positives and false negatives that is acceptable.

With 64 bit hashes, a difference of 0 means the hashes are identical. A difference of 32 means that there's no similarity at all. A difference of 64 means that one hash is the exact negative of the other.

Håken Lid
  • 22,318
  • 9
  • 52
  • 67
  • 2
    Is there a way to manipulate those images online without downloading them? i.e giving the code image's url instead of local path ? – Youcef Oct 10 '18 at 15:58
  • 3
    Not in the library itself. You must use something like `requests` to download images over http. It's just a few lines of code. https://stackoverflow.com/a/32859290/1977847 – Håken Lid Oct 10 '18 at 16:00
  • 1
    i don't want to download the images on my local machine, cause it will take too much space when dealing with millions of them – Youcef Oct 10 '18 at 16:07
  • 2
    If you use the code I linked to, images will only be processed in memory. They will not stored to your drive, unless you explicitly do so. – Håken Lid Oct 10 '18 at 16:14
  • 2
    @MDP, yes. Because images are resized to the same tiny size before hashing. Typically 8x8 pixels. – Håken Lid Feb 23 '19 at 10:23
  • 1
    @HåkenLid How did you determined the cutoff value? – Life is complex Aug 06 '19 at 19:40
  • @HåkenLid - I had assumed that the cutoff changed based on the images being evaluated. I'm just trying to determine if 20 would be considered high. – Life is complex Aug 06 '19 at 20:24
  • If I remember correctly, the expected difference between two randomly picked images is approximately 32. So a cutoff of 20 would probably give you too many false positive matches. – Håken Lid Aug 06 '19 at 22:09
  • 1
    How can the cutoff be interpreted? Does 5 mean "5% of the image is similar" or should it be interpreted some other way? – lepton Feb 17 '21 at 10:50
  • 2
    With a hash/fingerprint/bitmap of 8x8pixels or 64 bits, a distance of 5 means that 5 of the 64 bits in the two hashes are different. Here's an article that explains how the imagehash algorithms work, and includes example images. https://content-blockchain.org/research/testing-different-image-hash-functions/ – Håken Lid Feb 17 '21 at 15:48
  • so higher cutoff value means lesser sensitivity? – Earlee Apr 28 '21 at 02:16
  • Yeah. What I'm calling cutoff here is the maximum number of different bits in the hashes. – Håken Lid Apr 28 '21 at 09:38
7

The two images are NOT the same - only the thing imaged. The images obviously are different size, as you note yourself. Thus a comparison must fail.

You'll need to employ some kind of similarity check. The first step is to scale up the smaller image to the one of the larger one. Then you need to employ some mean of detecting and defining similarity. There are different ways and methods for that, and any combination of them might be valid.

For example see Checking images for similarity with OpenCV

planetmaker
  • 5,884
  • 3
  • 28
  • 37
1
import cv2

class CompareImage(object):

    def __init__(self, image_1_path, image_2_path):
        self.minimum_commutative_image_diff = 1
        self.image_1_path = image_1_path
        self.image_2_path = image_2_path

    def compare_image(self):
        image_1 = cv2.imread(self.image_1_path, 0)
        image_2 = cv2.imread(self.image_2_path, 0)
        commutative_image_diff = self.get_image_difference(image_1, image_2)

        if commutative_image_diff < self.minimum_commutative_image_diff:
            print "Matched"
            return commutative_image_diff
        return 10000 //random failure value

    @staticmethod
    def get_image_difference(image_1, image_2):
        first_image_hist = cv2.calcHist([image_1], [0], None, [256], [0, 256])
        second_image_hist = cv2.calcHist([image_2], [0], None, [256], [0, 256])

        img_hist_diff = cv2.compareHist(first_image_hist, second_image_hist, cv2.HISTCMP_BHATTACHARYYA)
        img_template_probability_match = cv2.matchTemplate(first_image_hist, second_image_hist, cv2.TM_CCOEFF_NORMED)[0][0]
        img_template_diff = 1 - img_template_probability_match

        # taking only 10% of histogram diff, since it's less accurate than template method
        commutative_image_diff = (img_hist_diff / 10) + img_template_diff
        return commutative_image_diff


    if __name__ == '__main__':
        compare_image = CompareImage('image1/path', 'image2/path')
        image_difference = compare_image.compare_image()
        print image_difference
Gokul Raghu
  • 55
  • 2
  • 11
  • 1
    Hi, and thanks for the answer. It would really help our readers if you could explain why and how your answer solves the OPs problems – Simas Joneliunas Jan 25 '22 at 13:05