1

Hi i recently formatted my phone and uploaded my photos to my pc, when i wanted to add my photos back to my phone i saw that i have multiple duplicates of some images. I wanted to merge all my photos into one folder then upload it to my phone so i wrote a java code.

public class Main {

public static int imgCtr = 1;
public static File dest = new File("D:\\finalfinal");

public static void main(String[] args) throws Exception {
    getContent("D:\\restoreFinal");
    getContent("D:\\restore1");
    getContent("D:\\restore2");
}

public static String getExtension(String fileName) {
    String extension = "";

    int i = fileName.lastIndexOf('.');
    if (i > 0) {
        extension = fileName.substring(i + 1);
    }
    return extension;
}

public static boolean isImage(String extension) {
    if (extension.equalsIgnoreCase("jpg") || extension.equalsIgnoreCase("jpeg")
            || extension.equalsIgnoreCase("png"))
        return true;
    return false;
}

public static boolean compareImages(File a, File b) throws Exception {
    FileInputStream fisA = new FileInputStream(a);
    FileInputStream fisB = new FileInputStream(b);
    byte contentA[] = new byte[(int) a.length()];
    byte contentB[] = new byte[(int) b.length()];
    fisA.read(contentA);
    fisB.read(contentB);
    String strA = new String(contentA);
    String strB = new String(contentB);
    fisA.close();
    fisB.close();
    return strA.equals(strB);
}

public static void getContent(String path) throws Exception {
    File source = new File(path);
    ArrayList<File> content = new ArrayList<File>(Arrays.asList(source.listFiles()));
    while (!content.isEmpty()) {
        File f = content.get(0);
        if (isImage(getExtension(f.getName()))) {
            if (dest.listFiles().length == 0) {
                Path p = Paths.get(dest + "\\i" + imgCtr + "." + getExtension(f.getName()));
                imgCtr++;
                Files.move(f.toPath(), p);
                System.out.println(imgCtr);
            } else {
                File[] alreadyThere = dest.listFiles();
                boolean match = false;
                for (File cmp : alreadyThere) {
                    if (compareImages(f, cmp)) {
                        match = true;
                        break;
                    }
                }
                if (!match) {
                    Path p = Paths.get(dest + "\\i" + imgCtr + "." + getExtension(f.getName()));
                    imgCtr++;
                    Files.move(f.toPath(), p);
                    System.out.println(imgCtr);
                }
            }
        }
        content.remove(0);
    }
}

}

I wrote image compare with string compares because the pixel comparing took really long (had around 2k photos). But the problem is somehow it copies a photo multiple times without any difference i can see. And i searched for the source folders but it copies photos arbitrarily, even the photos that didn't have duplicates had duplicates in the destination folder. I doubt it is about the compare method, but couldn't find my mistake.

So can you help me find my fault or suggest a fast and more reliable way to compare images?

BrokenFrog
  • 500
  • 1
  • 6
  • 15
  • @MeetTitan do you suggest comparing chunks of pixels. I believe that still would take long time because i have similar photos, or is it a way to compare, let's say top 250x250 px with a fast way without n^2 complexity – BrokenFrog Apr 21 '16 at 19:46
  • 2
    You may want to simply start with a file size & file checksum comparison and only then go onto a more processor intensive comparison? – Andre M Apr 21 '16 at 19:48
  • @AndreM it is actually really nice idea. Thanks, but how do i implement the checksum of the photo. Should i take some random pixels or hash the string, and if i hash the string should i be afraid of conflicting hashes with different photos – BrokenFrog Apr 21 '16 at 19:55
  • Ignore the contents and do the file instead. If checksum and file size are the same, then you should be good. – Andre M Apr 21 '16 at 19:57

1 Answers1

2

Comparing pixels is fine if the images haven't been resaved or haven't passed through a lossy file format such as JPEG. If they haven't then start off with a checksum comparison and only then if their checksums don't much do a more extensive pixel comparison, though lossy algorithms will require a different approach.

Community
  • 1
  • 1
Andre M
  • 6,649
  • 7
  • 52
  • 93
  • Why would it matter if it is jpeg, because most of them are jpeg few are png – BrokenFrog Apr 21 '16 at 19:58
  • If the images haven't been modified and are the same then it won't matter. But if you saved the same image as a JPEG twice, there is no guarantee the pixel data will be exactly the same. Remember JPEG is lossy. If they came from your phone and you are simply copying them over twice, then they are going to be the same. – Andre M Apr 21 '16 at 19:59
  • I see, thanks for the help but there is one thing that didn't get in my head. With checksum comparision you suggest me to create a hash and check them right? If so the algorithm would be: 1-Compare size 2-If matches compare checksum (So in here if it does not match that means image is different) but the thing is i didnt even bother with checksum, lost some speed with it but got the whole content of image. Even so i got some duplicates. I think checksum won't prevent this one – BrokenFrog Apr 21 '16 at 20:02
  • See the first link in the answer, shared again here: http://stackoverflow.com/questions/304268/getting-a-files-md5-checksum-in-java – Andre M Apr 21 '16 at 20:05
  • sorry edited the previous comment, added it without writing all my problem. Do you suggest if the checksum matches (which means they match) compare the pixels to see any difference? – BrokenFrog Apr 21 '16 at 20:07
  • I wouldn't think it would be worth doing more work. If you have any doubts put together a simple application and try it out. BTW if files are not the same size don't bother comparing the checksum, since you already know they are different. – Andre M Apr 21 '16 at 21:00