measuring similarity between two rgb images in python

Question

I have two rgb images of same size, and I would like to compute a similarity metric. I thought of starting out with euclidean distance:

import scipy.spatial.distance as dist
import cv2

im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")

>> im1.shape
(820, 740, 3)

>> dist.euclidean(im1,im2)

ValueError: Input vector should be 1-D.

I know that dist.euclidean expects a 1-D array and im1 and im2 are 3-D, but is there a function that will work with 3-D arrays, or is it possible to transform im1 and im2 into a 1-D array that preserves the information in the images?

`numpy.reshape()` is your friend. Your similarity measure won't be of much use with images though. — ypnos, Sep 12 '19 at 21:26
@ypnos, thanks! yes you are right, the image comparison with euclidean distance gives me nonsensical results. I suppose it's not meant for image comparison... What do you suggest I use instead? — HappyPy, Sep 12 '19 at 21:41
@HappyPy, SSIM represents the structural similarity index between the two input images. This value can fall into the range `[-1, 1]` with a value of one being a “perfect match”. This method will give you a quantitative measurement between two images. It also returns the actual image differences between the two input images but for your case, I think you're just interested in the actual image similarity "score". The method comes from the paper [Image Quality Assessment: From Error Visibility to Structural Similarity](https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf) — nathancy, Sep 12 '19 at 21:48
This method is already implemented in `scikit-image` as [`compare_ssim`](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.compare_ssim) — nathancy, Sep 12 '19 at 21:48

bballdave025 · Answer 1 · 2021-03-26T22:34:01.007

Grayscale Solution (?)

(There is a discussion below as to your comment about a function "that preserves the information in the images")

It seems possible to me that you might be able to solve the problem using a grayscale image rather than an RGB image. I know I'm making assumptions here, but it's a thought.

I'm going to try a simple example relating to your code, then give an example of an image similarity measure using 2D Discrete Fourier Transforms that uses a conversion to grayscale. That DFT analysis will have its own section

(My apologies if you see this while in progress. I'm just trying to make sure my work is saved.)

Because of my assumption, I'm going to try your method with some RGB images, then see if the problem will be solved by converting to grayscale. If the problem is solved with grayscale, we can do an analysis of the amount of information loss brought on by the grayscale solution by finding the image similarity using a combination of all three channels, each compared separately.

Method

Making sure I have all the libraries/packages/whatever you want to call them.

> python -m pip install opencv-python
> python -m pip install scipy
> python -m pip install numpy

Note that, in this trial, I'm using some PNG images that were created in the attempt (described below) to use a 2D DFT.

Making sure I get the same problem

>>> import scipy.spatial.distance as dist
>>> import cv2
>>>
>>> im1 = cv2.imread("rhino1_clean.png")
>>> im2 = cv2.imread("rhino1_streak.png")
>>>
>>> im1.shape
(178, 284, 3)
>>>
>>> dist.euclidean(im1, im2)
## Some traceback stuff ##
ValueError: Input vector should be 1-D.

Now, let's try using grayscale. If this works, we can simply find the distance for each of the RGB Channels. I hope it works, because I want to do the information-loss analysis.

Let's convert to grayscale:

>>> im1_gray = cv2.cvtColor(im1, cv2.COLOR_BGR2GRAY)
>>> im2_gray = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)

>>> im1_gray.shape
(178, 284)

A simple dist.euclidean(im1_gray, im2,gray) will lead to the same ValueError: Input vector should be 1-D. exception, but I know the structure of a grayscale image array (an array of pixel rows), so I do the following.

>>> dists = []
>>> for i in range(0, len(im1_gray)):
...   dists.append(dist.euclidean(im1_gray[i], im2_gray[i]))
...
>>> sum_dists = sum(dists)
>>> ave_dist = sum_dists/len(dists)
>>> ave_dist
2185.9891304058297

By the way, here are the two original images:

Grayscale worked (with massaging), let's try color

Following some procedure from this SO answer, let's do the following.

Preservation of Information

Following the analysis here (archived), let's look at our information loss. (Note that this will be a very naïve analysis, but I want to give a crack at it.

Grayscale vs. Color Information

Let's just look at the color vs. the grayscale. Later, we can look at whether we preserve the information about the distances.

Comparisons of different distance measures using grayscale vs. all three channels - comparison using ratio of distance sums for a set of images.

I don't know how to do entropy measurements for the distances, but my intuition tells me that, if I calculate distances using grayscale and using color channels, I should come up with similar ratios of distances IF I haven't lost any information.

My first thought when seeing this question was to use a 2-D Discrete Fourier Transform, which I'm sure is available in Python or NumPy or OpenCV. Basically, your first components of the DFT will relate to large shapes in your image. (Here is where I'll put in a relevant research paper: link. I didn't look too closely - anyone is welcome to suggest another.)

So, let me look up a 2-D DFT easily available from Python, and I'll get back to putting up some working code.

(My apologies if you see this while in progress. I'm just trying to make sure my work is saved.)

First, you'll need to make sure you have ~~PIL~~ Pillow and NumPy. It seems you have NumPy, but here are some instructions. (Note that I'm on Windows at the moment) ...

> python -m pip install opencv-python
> python -m pip install numpy
> python -m pip install pillow

Now, here are 5 images -

a rhino image, rhino1_clean.jpg (source);

the same image with some black streaks drawn on by me in MS Paint, rhino1_streak.jpg;

another rhino image, rhino2_clean.jpg (source);

a first hippo image hippo1_clean.jpg (source);

a second hippo image, hippo2_clean.jpg (source).

All images used with fair use.

Okay, now, to illustrate further, let's go to the Python interactive terminal.

>python

>>> import PIL
>>> import numpy as np

First of all, life will be easier if we use grayscale PNG images - PNG because it's a straight bitmap (rather than a compressed image), grayscale because I don't have to show all the details with the channels.

>>> rh_img_1_cln = PIL.Image.open("rhino1_clean.jpg")
>>> rh_img_1_cln.save("rhino1_clean.png")
>>> rh_img_1_cln_gs = PIL.Image.open("rhino1_clean.png").convert('LA')
>>> rh_img_1_cln_gs.save("rhino1_clean_gs.png")

Follow similar steps for the other four images. I used PIL variable names, rh_img_1_stk, rh_img_2_cln, hp_img_1_cln, hp_img_2_cln. I ended up with the following image filenames for the grayscale images, which I'll use further: rhino1_streak_gs.png, rhino2_clean_gs.png, hippo1_clean_gs.png, hippo2_clean_gs.png.

Now, let's get the coefficients for the DFTs. The following code (ref. this SO answer) would be used for the first, clean rhino image.

Let's "look" at the image array, first. This will show us a grid version of the top-left column, with higher values being more white and lower values being more black.

Note that, before I begin outputting this array, I set things to the numpy default, cf. https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html

>>> np.set_printoptions(edgeitems=3,infstr='inf',
... linewidth=75, nanstr='nan', precision=8,
... suppress=False, threshold=1000, formatter=None)

>>> rh1_cln_gs_array = np.array(rh_img_1_cln_gs)
>>> for i in {0,1,2,3,4}:
...   print(rh1_cln_gs_array[i][:13])
...
[93 89 78 87 68 74 58 51 73 96 90 75 86]
[85 93 64 64 76 49 19 52 65 76 86 81 76]
[107  87  71  62  54  31  32  49  51  55  81  87  69]
[112  93  94  72  57  45  58  48  39  49  76  86  76]
[ 87 103  90  65  88  61  44  57  34  55  70  80  92]

Now, let's run the DFT and look at the results. I change my numpy print options to make things nicer before I start the actual transform.

>>> np.set_printoptions(formatter={'all':lambda x: '{0:.2f}'.format(x)})
>>>
>>> rh1_cln_gs_fft = np.fft.fft2(rh_img_1_cln_gs)
>>> rh1_cln_gs_scaled_fft = 255.0 * rh1_cln_gs_fft / rh1_cln_gs_fft.max()
>>> rh1_cln_gs_real_fft = np.absolute(rh1_cln_gs_scaled_fft)
>>> for i in {0,1,2,3,4}:
...   print(rh1_cln_gs_real_fft[i][:13])
...
[255.00 1.46 7.55 4.23 4.53 0.67 2.14 2.30 1.68 0.77 1.14 0.28 0.19]
[38.85 5.33 3.07 1.20 0.71 5.85 2.44 3.04 1.18 1.68 1.69 0.88 1.30]
[29.63 3.95 1.89 1.41 3.65 2.97 1.46 2.92 1.91 3.03 0.88 0.23 0.86]
[21.28 2.17 2.27 3.43 2.49 2.21 1.90 2.33 0.65 2.15 0.72 0.62 1.13]
[18.36 2.91 1.98 1.19 1.20 0.54 0.68 0.71 1.25 1.48 1.04 1.58 1.01]

Now, the result for following the same procedure with rhino1_streak.jpg

[255.00 3.14 7.69 4.72 4.34 0.68 2.22 2.24 1.84 0.88 1.14 0.55 0.25]
[40.39 4.69 3.17 1.52 0.77 6.15 2.83 3.00 1.40 1.57 1.80 0.99 1.26]
[30.15 3.91 1.75 0.91 3.90 2.99 1.39 2.63 1.80 3.14 0.77 0.33 0.78]
[21.61 2.33 2.64 2.86 2.64 2.34 2.25 1.87 0.91 2.21 0.59 0.75 1.17]
[18.65 3.34 1.72 1.76 1.44 0.91 1.00 0.56 1.52 1.60 1.05 1.74 0.66]

I'll print \Delta values instead of doing a more comprehensive distance. You could sum the squares of the values shown here, if you want a distance.

>>> for i in {0,1,2,3,4}:
...   print(rh1_cln_gs_real_fft[i][:13] - rh1_stk_gs_real_fft[i][:13])
...
[0.00 -1.68 -0.15 -0.49 0.19 -0.01 -0.08 0.06 -0.16 -0.11 -0.01 -0.27
 -0.06]
[-1.54 0.64 -0.11 -0.32 -0.06 -0.30 -0.39 0.05 -0.22 0.11 -0.11 -0.11 0.04]
[-0.53 0.04 0.14 0.50 -0.24 -0.02 0.07 0.30 0.12 -0.11 0.11 -0.10 0.08]
[-0.33 -0.16 -0.37 0.57 -0.15 -0.14 -0.36 0.46 -0.26 -0.07 0.13 -0.14
 -0.04]
[-0.29 -0.43 0.26 -0.58 -0.24 -0.37 -0.32 0.15 -0.27 -0.12 -0.01 -0.17
 0.35]

I'll be putting just three coefficient arrays truncated to a length of five to show how this works for showing image similarity. Honestly, this is an experiment for me, so we'll see how it goes.

You can work on comparing those coefficients with distances or other metrics.

More About Preservation of Information

Let's do an information-theoretical analysis of information loss with the methods proposed above. Following the analysis here (archived), let's look at our information loss.

Good luck!

score 0 · Answer 2 · answered Sep 12 '19 at 21:33

0

You can try

import scipy.spatial.distance as dist
import cv2
import numpy as np

im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")

dist.euclidean(im1.flatten(), im2.flatten())

answered Sep 12 '19 at 21:33

aminrd

4,300
4
23
45

moe asal · Answer 3 · 2019-09-12T21:42:10.433

0

You can use the reshape function for both images to convert them from 3D to 1D.

import scipy.spatial.distance as dist
import cv2

im1 = cv2.imread("im1.jpg")
im2 = cv2.imread("im2.jpg")

im1.reshape(1820400)
im2.reshape(1820400)

dist.euclidean(im1,im2)

edited Sep 12 '19 at 21:42

answered Sep 12 '19 at 21:34

moe asal

750
8
24

measuring similarity between two rgb images in python

3 Answers3

Grayscale Solution (?)

Preservation of Information

More About Preservation of Information

Linked