2

I have 2 same images with different Image properties and file properties (e.g. CreationDate, etc.). When I calculate hash, I get different hashes. Is there any way to skip such properties and calculate hash to get same hashes?

Awaiting help. Thanks

Kishan
  • 73
  • 9

1 Answers1

5

You can read the image data into a byte array and hash that byte array.

That way, differences in meta-data would not be considered.

Since the 2D data is read into a 1D array, you can construct cases where two images with different dimensions have the same hash. For example, consider a 2x2 image and a 4x1 image. R means red and B means blue (just to pick two colors)

RB
BR

and

RBBR

Both would have the same hash code. If that matters to you, prepend (or append) the width and height of the image to the byte array before hashing.

Community
  • 1
  • 1
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Thank you! Is there any solution for video formats? – Kishan Mar 09 '16 at 02:23
  • 1
    There's a lot more data involved, but the same basic approach. You could probably grab a few seconds of data from the middle of the video and use that if performance is key. I would not grab the beginning or end as some videos have the same lead-in (e.g. if the same company made them) – Eric J. Mar 09 '16 at 04:21