My Current way of generating MD5 hashes of all files under a root directory, up to a given depth is shown below.
As of now, it takes about 10 seconds (old intel core i3 cpu) to process appx 300 images, each on average 5-10 MB in size. The parallel
option in stream
does not help. With or without it, the time remains more or less same. How can I make this faster ?
Files.walk(Path.of(rootDir), depth)
.parallel() // doesn't help, time appx same as without parallel
.filter(path -> !Files.isDirectory(path)) // skip directories
.map(FileHash::getHash)
.collect(Collectors.toList());
The getHash
method used above gives a comma separated hash,<full file path>
output line for each file being processed in the stream.
public static String getHash(Path path) {
MessageDigest md5 = null;
try {
md5 = MessageDigest.getInstance("MD5");
md5.update(Files.readAllBytes(path));
} catch (Exception e) {
e.printStackTrace();
}
byte[] digest = md5.digest();
String hash = DatatypeConverter.printHexBinary(digest).toUpperCase();
return String.format("%s,%s", hash, path.toAbsolutePath());
}