0

Background

I've got some files that I'm running hashes on and I need to know if those files exist already or not. I'm storing the hashes (as a byte digest Byte[]) and file path (as a Path) in a hashtable to make this quick and so I can print a proper message about duplicates.

Issue

The problem that I'm running into is that it appears the implementation behind Hashtable or byte[] is causing the address or perhaps the toString() value to be used as the key. I could convert the byte digest to a human readable string but am worried about the overhead on this as there are a lot of small files to be checked.

Question

Am I correct in my assumptions about the hashtable using address over value, and if so, how do I make sure the value is used instead?

Code

static Hashtable<byte[], Path> htable = new Hashtable<byte[], Path>();

public static void main(String[] args) throws NoSuchAlgorithmException, IOException {

    String file = "Z:\\file.txt";
    String file2 = "Z:\\file2.txt";
        
    // First file
    Path filePath = Paths.get(file);
    doHashStore(filePath);
        
    // Second file
    filePath = Paths.get(file2);
    doHashStore(filePath);
        
    // First again, should notify already exists
    filePath = Paths.get(file);
    doHashStore(filePath);
}

// Add to table
private static void doHashStore(Path p) throws NoSuchAlgorithmException, IOException{
    byte[] md5 = doMD5(p);
    if(htable.containsKey(md5)) {
        System.out.println(p.toString()+" already in table! "+toHexadecimal(md5));
    } else {
        htable.put(md5, p);
        System.out.println(p.toString()+" added to table "+toHexadecimal(md5));
    }
}

// Perform the md5 hash
private static byte[] doMD5(Path filePath) throws IOException, NoSuchAlgorithmException{
    byte[] b = Files.readAllBytes(filePath);
    byte[] c = MessageDigest.getInstance("MD5").digest(b);
    return c;
}

// Convert the byte[] from digest into a human readable hash string
private static String toHexadecimal(byte[] digest){
    String hash = "";
    for(byte aux : digest) {
        int b = aux & 0xff;
        if (Integer.toHexString(b).length() == 1)
            hash += "0";
        hash += Integer.toHexString(b);
    }
    return hash;
}
Community
  • 1
  • 1
Kelly Bang
  • 727
  • 6
  • 16
  • Your assumption is correct but the *address* may not always be used as the `hashCode`. Also arrays are dynamic classes so you can't help this. – Chetan Kinger Jan 29 '17 at 06:29
  • The solution will be to create a simple wrapper class for your `byte[]` objects that overrides `equals` and `hashCode` as you need. – Stephen C Jan 29 '17 at 06:45

0 Answers0