Checking images for similarity with OpenCV

Question

Does OpenCV support the comparison of two images, returning some value (maybe a percentage) that indicates how similar these images are? E.g. 100% would be returned if the same image was passed twice, 0% would be returned if the images were totally different.

I already read a lot of similar topics here on StackOverflow. I also did quite some Googling. Sadly I couldn't come up with a satisfying answer.

See also answers at https://stackoverflow.com/questions/4196453/simple-and-fast-method-to-compare-images-for-similarity — B. Go, Jun 19 '19 at 23:00

score 258 · Accepted Answer · edited Jun 20 '20 at 09:12

This is a huge topic, with answers from 3 lines of code to entire research magazines.

I will outline the most common such techniques and their results.

Comparing histograms

One of the simplest & fastest methods. Proposed decades ago as a means to find picture simmilarities. The idea is that a forest will have a lot of green, and a human face a lot of pink, or whatever. So, if you compare two pictures with forests, you'll get some simmilarity between histograms, because you have a lot of green in both.

Downside: it is too simplistic. A banana and a beach will look the same, as both are yellow.

OpenCV method: compareHist()

Template matching

A good example here matchTemplate finding good match. It convolves the search image with the one being search into. It is usually used to find smaller image parts in a bigger one.

Downsides: It only returns good results with identical images, same size & orientation.

OpenCV method: matchTemplate()

Feature matching

Considered one of the most efficient ways to do image search. A number of features are extracted from an image, in a way that guarantees the same features will be recognized again even when rotated, scaled or skewed. The features extracted this way can be matched against other image feature sets. Another image that has a high proportion of the features matching the first one is considered to be depicting the same scene.

Finding the homography between the two sets of points will allow you to also find the relative difference in shooting angle between the original pictures or the amount of overlapping.

There are a number of OpenCV tutorials/samples on this, and a nice video here. A whole OpenCV module (features2d) is dedicated to it.

Downsides: It may be slow. It is not perfect.

Over on the OpenCV Q&A site I am talking about the difference between feature descriptors, which are great when comparing whole images and texture descriptors, which are used to identify objects like human faces or cars in an image.

to compare similar images that only have a few distinct images (e.g. a new object moved into the otherwise same view) you can also work with `absdiff` https://www.codota.com/code/java/methods/org.opencv.core.Core/absdiff Thresholding the result produces a mask that allows you to highlight the regions that changed from scene to scene. — Max F., Mar 20 '20 at 17:01

score 74 · Answer 2 · edited Jan 02 '23 at 14:46

Since no one has posted a complete concrete example, here are two quantitative methods to determine the similarity between two images. One method for comparing images with the same dimensions; another for scale-invariant and transformation indifferent images. Both methods return a similarity score between 0 to 100, where 0 represents a completely different image and 100 represents an identical/duplicate image. For all other values in between: the lower the score, the less similar; the higher the score, the more similar.

Method #1: Structural Similarity Index (SSIM)

To compare differences and determine the exact discrepancies between two images, we can utilize Structural Similarity Index (SSIM) which was introduced in Image Quality Assessment: From Error Visibility to Structural Similarity. SSIM is an image quality assessment approach which estimates the degradation of structural similarity based on the statistical properties of local information between a reference and a distorted image. The range of SSIM values extends between [-1, 1] and it typically calculated using a sliding window in which the SSIM value for the whole image is computed as the average across all individual window results. This method is already implemented in the scikit-image library for image processing and can be installed with pip install scikit-image.

The skimage.metrics.structural_similarity() function returns a comparison score and a difference image, diff. The score represents the mean SSIM score between two images with higher values representing higher similarity. The diff image contains the actual image differences with darker regions having more disparity. Larger areas of disparity are highlighted in black while smaller differences are in gray. Here's an example:

Input images

Difference image -> highlighted mask differences

The SSIM score after comparing the two images show that they are very similar.

Similarity Score: 89.462%

To visualize the exact differences between the two images, we can iterate through each contour, filter using a minimum threshold area to remove tiny noise, and highlight discrepancies with a bounding box.

Limitations: Although this method works very well, there are some important limitations. The two input images must have the same size/dimensions and also suffers from a few problems including scaling, translations, rotations, and distortions. SSIM also does not perform very well on blurry or noisy images. These problems are addressed in Method #2.

Code:

from skimage.metrics import structural_similarity
import cv2
import numpy as np

first = cv2.imread('clownfish_1.jpeg')
second = cv2.imread('clownfish_2.jpeg')

# Convert images to grayscale
first_gray = cv2.cvtColor(first, cv2.COLOR_BGR2GRAY)
second_gray = cv2.cvtColor(second, cv2.COLOR_BGR2GRAY)

# Compute SSIM between two images
score, diff = structural_similarity(first_gray, second_gray, full=True)
print("Similarity Score: {:.3f}%".format(score * 100))

# The diff image contains the actual image differences between the two images
# and is represented as a floating point data type so we must convert the array 
# to 8-bit unsigned integers in the range [0,255] before we can use it with OpenCV
diff = (diff * 255).astype("uint8")

# Threshold the difference image, followed by finding contours to
# obtain the regions that differ between the two images
thresh = cv2.threshold(diff, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]

# Highlight differences
mask = np.zeros(first.shape, dtype='uint8')
filled = second.copy()

for c in contours:
    area = cv2.contourArea(c)
    if area > 100:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(first, (x, y), (x + w, y + h), (36,255,12), 2)
        cv2.rectangle(second, (x, y), (x + w, y + h), (36,255,12), 2)
        cv2.drawContours(mask, [c], 0, (0,255,0), -1)
        cv2.drawContours(filled, [c], 0, (0,255,0), -1)

cv2.imshow('first', first)
cv2.imshow('second', second)
cv2.imshow('diff', diff)
cv2.imshow('mask', mask)
cv2.imshow('filled', filled)
cv2.waitKey()

Method #2: Dense Vector Representations

Typically, two images will not be exactly the same. They may have variations with slightly different backgrounds, dimensions, feature additions/subtractions, or transformations (scaled, rotated, skewed). In other words, we cannot use a direct pixel-to-pixel approach since with variations, the problem shifts from identifying pixel-similarity to object-similarity. We must switch to deep-learning feature models instead of comparing individual pixel values.

To determine identical and near-similar images, we can use the the sentence-transformers library which provides an easy way to compute dense vector representations for images and the OpenAI Contrastive Language-Image Pre-Training (CLIP) Model which is a neural network already trained on a variety of (image, text) pairs. The idea is to encode all images into vector space and then find high density regions which correspond to areas where the images are fairly similar.

When two images are compared, they are given a score between 0 to 1.00. We can use a threshold parameter to identify two images as similar or different. A lower threshold will result in clusters which have fewer similar images in it. Conversely, a higher threshold will result in clusters that have more similar images. A duplicate image will have a score of 1.00 meaning the two images are exactly the same. To find near-similar images, we can set the threshold to any arbitrary value, say 0.9. For instance, if the determined score between two images are greater than 0.9 then we can conclude they are near-similar images.

An example:

This dataset has five images, notice how there are duplicates of flower #1 while the others are different.

Identifying duplicate images

Score: 100.000%
.\flower_1 copy.jpg
.\flower_1.jpg

Both flower #1 and its copy are the same

Identifying near-similar images

Score: 97.141%
.\cat_1.jpg
.\cat_2.jpg

Score: 95.693%
.\flower_1.jpg
.\flower_2.jpg

Score: 57.658%
.\cat_1.jpg
.\flower_1 copy.jpg

Score: 57.658%
.\cat_1.jpg
.\flower_1.jpg

Score: 57.378%
.\cat_1.jpg
.\flower_2.jpg

Score: 56.768%
.\cat_2.jpg
.\flower_1 copy.jpg

Score: 56.768%
.\cat_2.jpg
.\flower_1.jpg

Score: 56.284%
.\cat_2.jpg
.\flower_2.jpg

We get more interesting results between different images. The higher the score, the more similar; the lower the score, the less similar. Using a threshold of 0.9 or 90%, we can filter out near-similar images.

Comparison between just two images

Score: 97.141%
.\cat_1.jpg
.\cat_2.jpg

Score: 95.693%
.\flower_1.jpg
.\flower_2.jpg

Score: 88.914%
.\ladybug_1.jpg
.\ladybug_2.jpg

Score: 94.503%
.\cherry_1.jpg
.\cherry_2.jpg

Code:

from sentence_transformers import SentenceTransformer, util
from PIL import Image
import glob
import os

# Load the OpenAI CLIP Model
print('Loading CLIP Model...')
model = SentenceTransformer('clip-ViT-B-32')

# Next we compute the embeddings
# To encode an image, you can use the following code:
# from PIL import Image
# encoded_image = model.encode(Image.open(filepath))
image_names = list(glob.glob('./*.jpg'))
print("Images:", len(image_names))
encoded_image = model.encode([Image.open(filepath) for filepath in image_names], batch_size=128, convert_to_tensor=True, show_progress_bar=True)

# Now we run the clustering algorithm. This function compares images aganist 
# all other images and returns a list with the pairs that have the highest 
# cosine similarity score
processed_images = util.paraphrase_mining_embeddings(encoded_image)
NUM_SIMILAR_IMAGES = 10 

# =================
# DUPLICATES
# =================
print('Finding duplicate images...')
# Filter list for duplicates. Results are triplets (score, image_id1, image_id2) and is scorted in decreasing order
# A duplicate image will have a score of 1.00
# It may be 0.9999 due to lossy image compression (.jpg)
duplicates = [image for image in processed_images if image[0] >= 0.999]

# Output the top X duplicate images
for score, image_id1, image_id2 in duplicates[0:NUM_SIMILAR_IMAGES]:
    print("\nScore: {:.3f}%".format(score * 100))
    print(image_names[image_id1])
    print(image_names[image_id2])

# =================
# NEAR DUPLICATES
# =================
print('Finding near duplicate images...')
# Use a threshold parameter to identify two images as similar. By setting the threshold lower, 
# you will get larger clusters which have less similar images in it. Threshold 0 - 1.00
# A threshold of 1.00 means the two images are exactly the same. Since we are finding near 
# duplicate images, we can set it at 0.99 or any number 0 < X < 1.00.
threshold = 0.99
near_duplicates = [image for image in processed_images if image[0] < threshold]

for score, image_id1, image_id2 in near_duplicates[0:NUM_SIMILAR_IMAGES]:
    print("\nScore: {:.3f}%".format(score * 100))
    print(image_names[image_id1])
    print(image_names[image_id2])

Nice post! can the second technique be used to identify differences in subsequent frames of a video? if compared pixel by pixel there is too much noise to detect the impactpoint of a bullet, since the frames are not perfectly identical. I would be interested in finding the precise location of the difference in the frames. — sharkyenergy, Dec 03 '22 at 16:46
I have never seen someone put so much work into an answer. Kudos. — James Baird, Jan 29 '23 at 22:41
Nice approach. Can I localize the changed area with #2? I want to detect a condition of an object in a scene, like a door getting open or an object being removed, from a security camera perspective. But needs to be robust to lighting changes, days and night. — Tina J, Aug 21 '23 at 06:27

kiranpradeep · Answer 3 · 2014-04-04T08:11:56.483

If for matching identical images ( same size/orientation )

// Compare two images by getting the L2 error (square-root of sum of squared error).
double getSimilarity( const Mat A, const Mat B ) {
if ( A.rows > 0 && A.rows == B.rows && A.cols > 0 && A.cols == B.cols ) {
    // Calculate the L2 relative error between images.
    double errorL2 = norm( A, B, CV_L2 );
    // Convert to a reasonable scale, since L2 error is summed across all pixels of the image.
    double similarity = errorL2 / (double)( A.rows * A.cols );
    return similarity;
}
else {
    //Images have a different size
    return 100000000.0;  // Return a bad value
}

Source

Priyanshu Chauhan · Answer 4 · 2017-08-03T13:44:45.650

Sam's solution should be sufficient. I've used combination of both histogram difference and template matching because not one method was working for me 100% of the times. I've given less importance to histogram method though. Here's how I've implemented in simple python script.

import cv2

class CompareImage(object):

    def __init__(self, image_1_path, image_2_path):
        self.minimum_commutative_image_diff = 1
        self.image_1_path = image_1_path
        self.image_2_path = image_2_path

    def compare_image(self):
        image_1 = cv2.imread(self.image_1_path, 0)
        image_2 = cv2.imread(self.image_2_path, 0)
        commutative_image_diff = self.get_image_difference(image_1, image_2)

        if commutative_image_diff < self.minimum_commutative_image_diff:
            print "Matched"
            return commutative_image_diff
        return 10000 //random failure value

    @staticmethod
    def get_image_difference(image_1, image_2):
        first_image_hist = cv2.calcHist([image_1], [0], None, [256], [0, 256])
        second_image_hist = cv2.calcHist([image_2], [0], None, [256], [0, 256])

        img_hist_diff = cv2.compareHist(first_image_hist, second_image_hist, cv2.HISTCMP_BHATTACHARYYA)
        img_template_probability_match = cv2.matchTemplate(first_image_hist, second_image_hist, cv2.TM_CCOEFF_NORMED)[0][0]
        img_template_diff = 1 - img_template_probability_match

        # taking only 10% of histogram diff, since it's less accurate than template method
        commutative_image_diff = (img_hist_diff / 10) + img_template_diff
        return commutative_image_diff


    if __name__ == '__main__':
        compare_image = CompareImage('image1/path', 'image2/path')
        image_difference = compare_image.compare_image()
        print image_difference

I dont understand well python. But what is 'commutative_image_diff ' type ? cv.Mat or double. If it is cv.Mat, compare 'commutative_image_diff < self.minimum_commutative_image_diff' how does it work or what the purpose of this compare. Can you explain for me ? — BulletRain, Jul 06 '20 at 10:16

Franz · Answer 5 · 2017-06-09T13:44:43.710

A little bit off topic but useful is the pythonic numpy approach. Its robust and fast but just does compare pixels and not the objects or data the picture contains (and it requires images of same size and shape):

A very simple and fast approach to do this without openCV and any library for computer vision is to norm the picture arrays by

import numpy as np
picture1 = np.random.rand(100,100)
picture2 = np.random.rand(100,100)
picture1_norm = picture1/np.sqrt(np.sum(picture1**2))
picture2_norm = picture2/np.sqrt(np.sum(picture2**2))

After defining both normed pictures (or matrices) you can just sum over the multiplication of the pictures you like to compare:

1) If you compare similar pictures the sum will return 1:

In[1]: np.sum(picture1_norm**2)
Out[1]: 1.0

2) If they aren't similar, you'll get a value between 0 and 1 (a percentage if you multiply by 100):

In[2]: np.sum(picture2_norm*picture1_norm)
Out[2]: 0.75389941124629822

Please notice that if you have colored pictures you have to do this in all 3 dimensions or just compare a greyscaled version. I often have to compare huge amounts of pictures with arbitrary content and that's a really fast way to do so.

hi, I just follow your step but I found that the normalize part could not get proper result. The final result is much greater than 1.0. Any idea? — G_cy, Jun 19 '18 at 13:39

score 0 · Answer 6 · answered Feb 15 '22 at 11:09

0

one can use auto encoder for such task using architectures like VGG16 on pre-trained ImageRes data; Then calculate distance between query and other images in order to find the closest match.

answered Feb 15 '22 at 11:09

Areza

5,623
7
48
79

Checking images for similarity with OpenCV

6 Answers6

Comparing histograms

Template matching

Feature matching

Linked

Related