Avoid obtaining same InputStream more than once

Question

I can see there are a number of posts regarding reuse InputStream. I understand InputStream is a one-time thing and cannot be reused.

However, I have a use case like this:

I have downloaded the file from DropBox by obtaining the DropBoxInputStream using the DropBox's Java SDK. I then need to upload the file to another system by passing the InputStream. However, as part of the download, I have to provide the MD5 of the file. So I have to read the file from the stream before uploading the file. Because the DropBoxInputStream I received can only be used once, I have to get another DropBoxInputStream after I have calculated the MD5 and before uploading the file. The procedure is like:

Get first DropBoxInputStream
Read from the DropBoxInputStream and calculate MD5
Get the second DropBoxInputStream
Upload file using the MD5 and the second DropBoxInputStream.

I am thinking that, if there are many way for me to "cache" or "backup" the InputStream before I calculate the MD5 so that I can save step 3 of obtaining the same DropBoxInputStream again?

Many thanks

EDIT:

Sorry I missed some information.

What I am currently doing is that I use a MD5DigestOutputStream to calculate MD5. I stream data across the MD5DigestOutputStream and save them locally as a temp file. Once the data goes through the MD5DigestOutputStream, it will calculate the MD5.

I then call a third party library to upload the file using the calculated md5 and a FileInputStream which reads from the temp file.

However, this requires huge disk space sometime and I want to remove the needs to use temp file. The library I use only accepts a MD5 and InputStream. This means I have to calculate the MD5 on my end. My plan is to use my MD5DigestOutputStream to write data to /dev/null (not keeping the file) so that I can calculate theMD5, and get the InputStream from DropBox again and pass that to the library I use. I assume the library will be able to get the file directly from DropBox without the need for me to cache the file either in the memory of at the disk. Will it work?

I have got the improved plan working. I did get the InputStream twice but the performance is not bad. With this way I do not need the file system to be involved in the download/upload process. Thanks all. — KKKCoder, May 23 '13 at 16:56

score 3 · Accepted Answer · edited May 23 '17 at 11:50

Input streams aren't really designed for creating copies or re-using, they're specifically for situations where you don't want to read off into a byte array and use array operations on that (this is especially useful when the whole array isn't available, as in, for e.g. socket comunication). You could buffer up into a byte array, which is the process of reading sections from the stream into a byte array buffer until you have enough information.

But that's unnecessary for calculating an md5. Notice that InputStream is abstract, so it needs be implemented in an extended class. It has many implementations- GZIPInputStream, fileinputstream etc. These are, in design pattern speak, decorators of the IO stream: they add extra functionality to the abstract base IO classes. For example, GZIPInputStream gzips up the stream.

So, what you need is a stream to do this for md5. There is, joyfully, a well documented similar thing: see this answer. So you should just be able to pass your dropbox input stream (as it will be itself an input stream) to create a new DigestInputStream, and then you can both take the md5 and continue to read as before.

Worried about type casting? The idea with decorators in Java is that, since the InputStream base class interfaces all the methods and 'beef' you need to do your IO, there's no harm in passing instances of objects inheriting from InputStream in the constructor of each stream implementation, and you can still do the same core IO.

Finally, I should probably answer your actual question- say you still want to "cache" or "backup" the stream anyway? Well, you could just write it to a byte array. This is well documented, but can become faff when your streams get more complicated. Alternatively, try looking at a PushbackInputStream. Here, you can easily write a function to read off n bytes, perform and operation on them, and then restore them to the stream. Generally good to avoid these implementations of streams in Java, as it's bad for memory use, but no worse than buffering everything up which you'd otherwise have to do.

Or, of course, I would have a go with DigestInputStream.

Hope this helps,

Best.

Great- so you're right about /dev/null; but perhaps I haven't understood but why can't you use input streams all the way? How do you end up with an output stream? — Tom, May 23 '13 at 12:04
Since you're getting these files in memory anyway, you may as well stay there. Try http://commons.apache.org/proper/commons-io/javadocs/api-1.4/org/apache/commons/io/IOUtils.html to fill in any gaps. — Tom, May 23 '13 at 12:43

score 1 · Answer 2 · answered May 23 '13 at 08:39

1

You don't need to open a new InputStream from DropBox.

Once you have read the file from DropBox, you have it locally. So it is either in memory (in a byte array) or you stored it in a local file. Now you can create an InputStream that reads the data from memory (ByteArrayInputStream) or disk (FileInputStream) in order to upload the file.

So instead of caching the InputStream (which you can't) you cache the contents (which you can).

answered May 23 '13 at 08:39

nakosspy

3,904
1
26
31

Thanks for your answer. This is exactly what I am doing at the moment. But I want to avoid using local file to cache. – KKKCoder May 23 '13 at 08:48
1

If you need to read the stream from dropbox in order to calculate the MD5, then there is no alternative but to open a new inputstream from the file you already downloaded. This is more efficient than what you wanted to do (reuse the DropboxInputstream). Why? Because if you were able to reuse the DropboxInputstream, then you would download the file from dropbox twice, once to calculate the checksum and once to upload it. – nakosspy May 23 '13 at 09:07

Avoid obtaining same InputStream more than once

2 Answers2