1

I need to compare images about 2000 images but looping into the images use all the processors.

Here is how I'm comparing the images:

NSImage *file = [[NSImage alloc] initWithContentsOfFile:path];
NSImage *fileTwo = [[NSImage alloc] initWithContentsOfFile:pathTwo];
NSData *imgDataOne = [file TIFFRepresentation];
NSData *imgDataTwo = [fileTwo TIFFRepresentation];

if ([imgDataOne isEqualToData: imgDataTwo])
{
    NSLog(@"is the same image");
}

I'm doing something wrong in the comparison or how can I compare the images without taking over the processors of my computer?

pkamb
  • 33,281
  • 23
  • 160
  • 191
user2924482
  • 8,380
  • 23
  • 89
  • 173
  • You need to think carefully about, and define, what constitutes *"equal"*. Are a black 32x32 pixel GIF and a black 32x32 PNG *equal*? Are two identical looking PNG files but created on different dates *equal*? Then you can decide your strategy - whether you simply calculate the MD5 checksums of all your images and run them through `uniq` or whatever. – Mark Setchell Dec 02 '15 at 22:07
  • @MarkSetchell but would be the best way of not consume all the processor and memory ? – user2924482 Dec 02 '15 at 22:13
  • You need to answer the other question first! The easiest way is to calculate the checksum of each file once and store them all in a list and then look for duplicates in the list - but that won't tell you the right answer if you want a black GIF to be identified as matching a black PNG of the same size... – Mark Setchell Dec 02 '15 at 22:19
  • @MarkSetchell All the images are jpg's I'm comparing jpg with jpg's and are color images – user2924482 Dec 02 '15 at 22:25

1 Answers1

2

The fastest way is to get a list of all the files and then, for each one, get its size and then say that files that are not the same size cannot be equal. That is fast, since it doesn't even require you to read the files from disk.

Once you find two, or more files of equal size, you can MD5 checksum them to see if the contents are identical - if you store the MD5 checksums as you calculate them, it is again faster than comparing every pair of files since you only read each file once.

There is certainly no need to create the TIFFRepresentation of each file...

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432