3

I am calculating the Structural Similarity Index between two images. I don't understand what the dimensionality should be. Both images (reference and target) are RGB images.

If I shape my image as (256*256, 3), I obtain:

    ref = Image.open('path1').convert("RGB")
    ref_array = np.array(ref).reshape(256*256, 3)
    print(ref_array.shape)    # (65536, 3)
    img = Image.open('path2').convert("RGB")
    img_array = np.array(img).reshape(256*256, 3)
    print(img_array.shape)    # (65536, 3)

    ssim = compare_ssim(ref_array,img_array,multichannel=True,data_range=255)

The result is 0.0786.

On the other hand, if I reshape to (256, 256, 3):

    ref = Image.open('path1').convert("RGB")
    ref_array = np.array(ref)
    print(ref_array.shape)    # (256, 256, 3)
    img = Image.open('path2').convert("RGB")
    img_array = np.array(img)
    print(img_array.shape)    # (256, 256, 3)

    ssim = compare_ssim(ref_array, img_array, multichannel=True, data_range=255)

The result is 0.0583

Which of the two results is correct and why? The documentation does not say anything about it, since it's probably a conceptual problem.

maurock
  • 527
  • 1
  • 7
  • 22
  • Out of interest, are the images supposed to look alike? SSIM was an alternative to MSE for evaluating "before" and "after" filtering/compression so I don’t think it’ll do well with rotations or slightly different crops. – Pam Mar 12 '20 at 19:10
  • I am using a ML approach for rendering, so the generated image is supposed to look like the reference image for N samples -> inf. SSIM for this task is often much more solid than MSE. – maurock Mar 12 '20 at 19:13
  • Like an auto encoder? It seems pretty low but that might be the case if you’re just at the start of training. – Pam Mar 12 '20 at 19:16
  • Not like an autoencoder. It's a variation of MC importance sampling for physically based rendering, where the sampling method is based on reinforcement learning. The SSIM is very low because I compared the reference generated with 5120 SSP vs an example at 8SPP (just as an example) – maurock Mar 12 '20 at 19:19
  • Ok, that sounds sensible. One other thing, I’ve used SSIM on YUV colourspace (1 greyscale channel, 2 colour channels). [This](https://stackoverflow.com/questions/52798540/working-with-ssim-loss-function-in-tensorflow-for-rgb-images) question implies RGB might give different values. But, it’s "old" and my experience of SSIM is not recent! – Pam Mar 12 '20 at 19:23

1 Answers1

3

The second one is correct, assuming you have a square shaped image and not a really long thin one.

SSIM takes neighbouring pixels into account (for luminance and chrominance masking and identifying structures). Images can be any shape, but if you tell the algorithm your shape is 256*256 by 1 pixel in shape, then the vertical structures will not be taken into account.

Pam
  • 1,146
  • 1
  • 14
  • 18