1

My goal is to be able to get the percentage of similarity between 2 images.

The point is that my definition of similarity is kind of special in this case.

Here are some example of what I want to achieve :

Image A

Image A

is similar to

Image A bis

Image A bis

HOWEVER,

Image B

Image B

is not similar to Image A (or A bis) but is similar to

Image B bis

Image B bis

I have already tried to follow the methods described here : Checking images for similarity with OpenCV but it didn't work in my case... In fact a black background was more similar to Image A than Image A bis was to Image A.

PS : I also tried with colors but the results were the same :

Image A colored

Image A colored

Image A bis colored

Image A bis colored

Image B colored

Image B colored

I did more research and someone told me that I could achieve what I wanted using FFT (fast Fourier transform) with openCV (which is what i use).

When applying FFT, this is what I get :

Image A FFT

Image A FFT

Image A bis FFT

Image A bis FFT

Image B FFT

Image B FFT

Image B bis FFT

Image B bis FFT

This leads me to these questions : is FFT really the way to go ? If yes, what can I try to do with my magnitude spectrums ? If no, is there another way to go that could solve my problem ?

PS : I would rather not use ML or deep learning id possible.

Thanks! (and sorry for the quantity of images)

EDIT 1 :

your metric could be number of overlapping pixels divided with the logical and of the two images

Why I haven't done this so far : because sometimes, the form that you see in the examples could be on top of the image wheras the form in my example is at the bottom. Moreover, one form could be much smaller than the one in the example even though they are still the same.

EDIT 2 :

I am looking for the local similarity. In fact, the size doesn't matter as long as the form itself is the same shape as the example. Could be much bigger, smaller, located on top, on bottom... It's all about the the shape of the form. However, for form must be in the same direction and rotation.

For instance, here are two images that must be classified as Image A :

Image A bis bis Image A bis bis bis

EDIT 3 :

The pictures you see are 30 stacked frames of a hand motion. That's why in Images A* you see 2 blobs --> swipe from left to right and the AI doesn't detect the hand in the center. Because the swipe isn't perfect, that's why the "blobs" are not at the same height every time. Moreover, if the hand if closer to the camera you will get Image A, if it is further you will get Image A bis bis of EDIT 2 section.

EDIT 4 :

The problem with IoU as I tried to use it regarding @Christoph Rackwitz answer is that it doesn't work in the case of Image A and Image A smaller (see EDIT 2 images).

  • 1
    for binary images your metric could be number of overlapping pixels divided with the logical and of the two images. This will give you the percentage of pixels that belong to both images. –  Jul 15 '22 at 15:17
  • I added an edit! Except if you have a solution for the problem I pinpointed it won't be possible to go this way. – Bonsai Noodle Jul 15 '22 at 15:23
  • How do you define "similar" and "same" then? "one form could be much smaller than the one in the example even though they are still the same" means you want a scale-invariant measure. "the form that you see in the examples could be on top of the image wheras the form in my example is at the bottom" means you want a translation-invariant measure. Should it be rotation-invariant as well? Do you want to measure local similarity or global similarity? – Cris Luengo Jul 15 '22 at 15:40
  • If the positions of the forms are not in the same location or the same size, then "similarity" is a very vague concept. You can try running a keypoint detector and descriptor on the image and reescale it several times to achieve scale invariance. Rotate it several times if you need rotation invariance. Then you can try to apply RANSAC to match the two images, and you can get the number of matching points as a metric for how similar the images are. –  Jul 15 '22 at 15:42
  • The local similarity. In fact, the size doesn't matter as long as the form itsellf is the same shape as the example. Could be much bigger, smaller, located on top, on bottom... It's all about the the shape of the form. However, for form must be in the same direction and rotation. – Bonsai Noodle Jul 15 '22 at 15:42
  • Then do what I suggested. You need a descriptor that is scale and location invariant for both images. Then try matching them –  Jul 15 '22 at 15:43
  • Ok, I am going to search more on RANSAC right now. So should I give up on FFT ? – Bonsai Noodle Jul 15 '22 at 15:45
  • What about shape similarity. See cv2.matchShapes() at https://docs.opencv.org/4.1.1/d3/dc0/group__imgproc__shape.html#gaadc90cb16e2362c9bd6e7363e6e4c317 – fmw42 Jul 15 '22 at 16:14
  • intersection over union. calculate it. don't overthink it. you have masks here. – Christoph Rackwitz Jul 15 '22 at 16:26
  • if IOU doesn't work for you, present data for which that doesn't work. – Christoph Rackwitz Jul 15 '22 at 16:35
  • 1
    Bonsai Noodle, please [edit] your post to clearly indicate what you are looking for. Comments under the question are not the right place to amend the question, people will answer without reading the comments, and comments can be deleted at any time. Please also indicate what the meaning is of the colored images. Do you need to match the black & white images, or the colored images, or both? Make sure your question is clear and unambiguous so people don't waste time writing answers that are not useful to you. See [ask]. – Cris Luengo Jul 15 '22 at 16:48
  • I edited the question - I am trying @Christoph Rockwitz answer right now. – Bonsai Noodle Jul 15 '22 at 16:52
  • sample data please. we can't be sure that we understand what you're describing. don't just describe. provide the data itself. – Christoph Rackwitz Jul 15 '22 at 17:03
  • ok those two pictures... what kind of similarity score do you expect here? one blob moves, but the other one did not move. is the moving blob supposed to be matched as "same" or not? neither looks very similar to picture "A" -- where is this data even from? what is the _source_ of these pictures? we can't solve a problem you aren't giving us to solve. – Christoph Rackwitz Jul 15 '22 at 17:06
  • Let me edit the question again. I'm sorry for this... – Bonsai Noodle Jul 15 '22 at 17:08
  • You could try FFT based phase correlation. See https://en.wikipedia.org/wiki/Phase_correlation – fmw42 Jul 15 '22 at 17:39
  • Looks like FFT based phase correlation only works for translation and not completely different images :/ isn't it ? – Bonsai Noodle Jul 16 '22 at 09:37

1 Answers1

2

Intersection Over Union:

picture

files = [ "A svk5g.png", "A bis aWBFd.png", "B x8214.png", "B bis b3Bdw.png" ]
ims = [cv.imread(f, cv.IMREAD_GRAYSCALE).astype(bool) for f in files]

def iou(a, b):
    return np.sum(a & b) / np.sum(a | b)

scores = np.array([[iou(a,b) for b in ims] for a in ims])
array([[1.     , 0.88364, 0.07373, 0.08069],
       [0.88364, 1.     , 0.06857, 0.06803],
       [0.07373, 0.06857, 1.     , 0.30637],
       [0.08069, 0.06803, 0.30637, 1.     ]])

So you see, 88% match between "A" and "A bis", just 30% match between "B" and "B bis", and 7-8% between A* and B*.

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36