0

I have the following (pretty nasty) code snippet, which is generating a md5-hash over the content of an item:

protected String createHashFromContentNew() throws CrawlerException {
    final StringBuilder builder = new StringBuilder();
    if (getContent() != null) {
        builder.append(new String(getContent()));
    }
    if (builder.length() == 0) {
        throw new CrawlerException(hashErrorMessage("the content of this item is empty!"));
    } else {
        return MD5Utils.generateMD5Hash(builder.toString());
    }
}

the MD5Utils.generateMD5Hash(builder.toString()); function could also be used with a InputStream.

getContent() returns a byte[].

This actually worked ok until I got items with huge sized contents. Since this is used in an multi-threaded environment it uses up a lot of RAM by holding the content multiple times.

I now want to use the generateMD5Hash() with the InputStream, to stop loading everything into the RAM. The problem is, that the outcoming hash must be the same as in the current function for all previously generated hashes.

Any ideas how to achive that in a proper way?

kuche
  • 3
  • 3

1 Answers1

0

Maybe you want ByteArrayInputStream ?

Have a look here.