1

I'm doing multiple parallel HTTP range requests and want to calculate the MD5 sum of each response using DigestInputStream. I also want to write the data from the HTTP stream to a file without creating intermediate files. Therefore I'm using FileChannel and access regions of a file. This is basically a Download Manager application.

Saving the HTTP Stream to the file is working, but if I try to use the DigestInputStream to calculate the MD5 sum on the fly, it seems the DigestInputStream is never read. I'm probably missing some important part of how FileChannel uses the InputStream and I hope this can be easily fixed.

I'd also be glad for suggestions for optimizations to achieve the goal outlined above.

Here's the class implementing the download tasks

private class MultiHttpClientConnThread extends Thread {
    private final Logger logger = Logger.getLogger(getClass());
    private final CloseableHttpClient client;
    private final HttpGet get;
    private final String md5sum;
    private File destinationFile;

    public MultiHttpClientConnThread(final CloseableHttpClient client, final HttpGet get, final File destinationFile) {
        this.client = client;
        this.get = get;
        this.destinationFile = destinationFile;
    }

    @Override
    public final void run() {
        try {
            logger.debug("Thread Running: " + getName());

            CloseableHttpResponse response = client.execute(get);

            String contentRange = response.getFirstHeader("Content-Range").getValue();
            Long startByte = Long.parseLong(contentRange.split("[ -]")[1]);

            Long length = response.getEntity().getContentLength();

            InputStream inputStream = response.getEntity().getContent();

            ReadableByteChannel readableByteChannel;

            MessageDigest messageDigest = MessageDigest.getInstance("MD5");

            DigestInputStream digestInputStream = new DigestInputStream(inputStream, messageDigest);
            readableByteChannel = Channels.newChannel(digestInputStream);

            RandomAccessFile randomAccessFile = new RandomAccessFile(destinationFile, "rw");
            FileChannel fileChannel = randomAccessFile.getChannel();

            fileChannel.transferFrom(readableByteChannel, startByte, length);

            md5sum = Hex.encodeHexString(messageDigest.digest());
            logger.info("Part MD5 sum: " + md5sum);

            logger.debug("Thread Finished: " + getName());

            response.close();
            fileChannel.close();
            randomAccessFile.close();
        } catch (final ClientProtocolException ex) {
            logger.error("", ex);
        } catch (final IOException ex) {
            logger.error("", ex);
        } catch (final NoSuchAlgorithmException ex) {
            logger.error("", ex);
        }
    }
}

Update

This is a bit embarassing as the code seems to be working fine. The problem was with the uploaded file I used for testing. As asked in How to create a repeatable incompressible fast InputStream in Java? I required a repeatable random input stream and I used the one from here which unfortunately seems to repeat itself. Therefore all threads had the same data and were providing the same MD5 sums and the MD5 sums were looking very similar (but not identical) to an empty file.

Florian Feldhaus
  • 5,567
  • 2
  • 38
  • 46
  • Have you gone through it with a debugger to see what's happening? – Kayaman Nov 15 '17 at 11:50
  • Yes, I've used a debugger and I see that the position of the `digestInputStream` changes. Everything looks right, but the MD5 sum is always the MD5 sum of an empty string. – Florian Feldhaus Nov 15 '17 at 13:31
  • Well, it's not related to channels if the `DIS` is making a move. Is that the exact code you're running? Copy pasted from your IDE? – Kayaman Nov 15 '17 at 13:50

1 Answers1

0

In the method "transferFrom", the second argument is the position where to start writing in the destination. If this is a new file, try setting it to zero instead of using the variable "startByte" (which is where to start reading from the source).

transferFrom