0

I have C# program which compare 2 .jpg files,
I was using this function I found in the internet to do that, it’s working well but it’s very slow ( takes more than a second to compare )

public static bool ImageCompareString(Bitmap firstImage, Bitmap secondImage)
{
 MemoryStream ms = new MemoryStream();
 firstImage.Save(ms, System.Drawing.Imaging.ImageFormat.Png);
 String firstBitmap = Convert.ToBase64String(ms.ToArray());
 ms.Position = 0;

 secondImage.Save(ms, System.Drawing.Imaging.ImageFormat.Png);
 String secondBitmap = Convert.ToBase64String(ms.ToArray());

 if (firstBitmap.Equals(secondBitmap))
 {
     return true;
 }
 else
 {
     return false;
 }
}

Now I was wondering why not use checksum which is faster to do the compare ?
Does the results byte to byte comparison are more accurate ?

The reason I need to compare jpg file:
On my PC I have thousands of jpg files taken from my Camera and Smartphone but many of them are duplicated (Identical pictures with same name exists on different sub folders and some having the same name but are not same pictures)

I want to move all the unique pictures to a new folder and delete those that are duplicate so in case I have 2 pictures that have the same name I need to compare them.

Epligam
  • 741
  • 2
  • 14
  • 36
  • Bitmap does not override GetHashCode so it will default to Object.GetHashCode which who knows what will do. Probably not even close to similar – Camilo Terevinto Jul 04 '17 at 15:59
  • 1
    Well using the unsafe lockbits and go pixel by pixel and get out when a compare fail is very very fast. This method takes 30 ms on 10,000x10,000 pixels. You can on big image compare the map since it 99% of the time it has way less values than the amount of pixels – Franck Jul 04 '17 at 16:06
  • This [related post](https://stackoverflow.com/questions/21396745/how-do-i-compare-if-2-images-are-the-same-using-hash-bytes) includes a home-grown hash function. – Axel Kemper Jul 04 '17 at 16:21
  • This function is particularly silly. It saves the files under the PNG format to a memory stream, then converts these "files" to Base64-encoded strings. Finally, it compares the strings. No surprise it is that slow. Never rely on code that you don't understand ! –  Jul 04 '17 at 21:03
  • It would be useful to tell us why exactly why you need to compare those images. There a several possible ways to understand this process. –  Jul 04 '17 at 21:14
  • What you want to do can easily be done with [Digikam](https://www.digikam.org/) and probably other software too. I even think it supports some kind of perceptual-hashing which is the approach to use in general, but might not even be needed for your case (description is incomplete). Writing perceptual-hashing algs yourself is not that hard, but i would recommend using already-tested software. – sascha Jul 05 '17 at 22:34

1 Answers1

0

With such a prototype, the function does not compare JPEG image files but decompressed bitmaps, wherever they are coming from. Pixel-by-pixel comparison will be efficient (using LockBits, as advised by Franck). Computing and comparing checksums can be faster, as this is a branchless operation, but be sure to use a fast formula.

If your goal is to compare files, not images, avoid loading the files as bitmaps, as this involves costly decompression and increases the memory footprint. Anyway this will also detect differences in tags/file organization even if the images are the same.

Last but not least, comparing images for similarity is yet a completely different story.

  • I don't see how calculating a checksum could be faster than side-by-side comparison of the image data. The checksum calculation algorithm would need to go over _all_ bytes, and that would need to be done on both images, whereas side-by-side comparison can abort at the first difference. – Nyerguds Sep 05 '17 at 13:52
  • @Nyerguds: in case the images are identical, branchless processing will be faster. –  Sep 05 '17 at 13:55
  • @Nyerguds: yep. This makes sense when most of the comparisons return equal. It's also possible to compute checkums per chunks. –  Sep 05 '17 at 14:07