6

With reference to the stackoverflow question it is said that the InputStream can be read multiple times with mark() and reset() provided by the InputStream or by using PushbackInputStream.

In all these cases the content of the stream is stored in byte array (ie; the original content of the file is stored in main memory) and reused multiple times.

What happens when the size of the file exceeds the memory size? I think this may pave way for OutOfMemoryException.

Is there any better way to read the stream content multiple times without storing the stream content locally (ie; in main memory)?

Please help me knowing this. Thanks in advance.

Community
  • 1
  • 1
Tom Taylor
  • 3,344
  • 2
  • 38
  • 63
  • You will have to create a new stream. – 4castle Jul 13 '16 at 17:17
  • Can I try like this? InputStream is = new InputStream("FilePath"); InputStream backupStream = is; Now I am reading the content from the stream is and closing the stream. Will the content be available in backupStream now? Please correct me if I am wrong. – Tom Taylor Jul 13 '16 at 17:18

1 Answers1

16

It depends on the source of the stream.

If it's a local file, you can likely re-open and re-read the stream as many times as you want.

If it's dynamically generated by a process, a remote service, etc., you might not be free to re-generate it. In that case, you need to store it, either in memory or in some more persistent (and slow) storage like a file system or storage service.


Maybe an analogy would help. Suppose your friend is speaking to you at length. You listen carefully without interruption, but when they are done, you realize you didn't understand something they said near the beginning, and want to review that portion.

At this point, there are a few possibilities.

Perhaps your friend was actually reading aloud from a book. You can simply re-read the book.

Or, perhaps you had to foresight to record their monologue. You can replay the recording.

However, since neither you nor your friend has perfect and unlimited recall, simply repeating verbatim what was said ten minutes ago from memory alone is not an option.

An InputStream is like your friend speaking. Neither of you has a good enough memory to remember exactly, word-for-word, what is said. In the same way, neither a process that is generating the data stream nor your program has enough RAM to store, byte-for-byte, the stream. To scale, your program has to rely on its "short-term memory" (RAM), working with just a small portion of the whole stream at any given time, and "taking notes" (writing to a persistent store) as it encounters important points.

If the source of stream is a local file, then it's like your friend reading a book. Either of you can re-read that content easily enough.

If you copy the stream to some persistent storage, that's like recording your friend's speech. You can replay it as often as you like.


Consider a scenario where browser is uploading a large file, but the server is busy, and not able to read that stream for some time. Where is that data stored during that delay?

Because the receiver can't always respond immediately to input, TCP and many other protocols allocate a small buffer to store some data from a sender. But, they also have a way to tell the sender to wait, they are sending data too fast—flow control. Going back to the analogy, it's like telling your friend to pause a moment while you catch up with your note-taking.

As the browser uploads the file, at first, the buffer will be filled. But if the server can't keep up, the browser will be instructed to pause its upload until there is more room in the buffer. (This generally happens at the OS and TCP level; the client and server applications don't manage this directly.) The upload speed depends on how fast the browser can read the file from disk, how fast the network link is, and how fast the server can process the uploaded data. Even a fast network and client will be limited by the weak link in this chain.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • Yeah you are right erickson. But can you please explain a bit more on this.. like why we need to store it, either in memory or in some persistent storage like file system in the case of remote service.. – Tom Taylor Jul 13 '16 at 17:22
  • I am not clear why the content is not there in the stream when the content is read from the stream. I am completely new to these streams kind of things.. Please help me knowing regarding this.. – Tom Taylor Jul 13 '16 at 17:23
  • 2
    @Rajasuba Subramanian: A stream is like a short pipe - it just connects a data source by a data consumer. Therefore it does not store anything by default. If you want to read the data two times you have to use a source that provides the data two times or you have to store the data while retrieving it for being able to provide the data later a second time. – Robert Jul 13 '16 at 17:35
  • @Robert I am not clear with `A stream just connects a data source by a data consumer` ... Kindly help me knowing how a stream connects a data source by a data consumer.. How this works.. Wish to know the logic behind this, please help me to know about this.. – Tom Taylor Jul 13 '16 at 17:39
  • 1
    @RajasubaSubramanian I added an example that I hope will help. – erickson Jul 13 '16 at 17:46
  • @Rajasuba Subramanian: There is no magic to understand. An InputStream is just the definition how to read one `byte` or `byte[]`. Where the data comes depends on the actual implementation (file, network, ...). – Robert Jul 13 '16 at 17:49
  • Mine is data transmitted over the network.. Like user uploads the file content and I would persist the file content in my database. Here we use `Streams` to transmit data over the network. – Tom Taylor Jul 13 '16 at 17:52
  • If the file is in our machine then a I can assume `Stream` like a pipe to transfer the data.. Does this apply to data transmitted over network also? – Tom Taylor Jul 13 '16 at 17:54
  • I am assuming like `Stream` is like a pipe opened to read the data from the browser from a remote service. The stream opened in the http connection would know where to read the file from (i.e.; the stream opened for read would contain the file reference point). Why shouldn’t the stream read it again from the same reference point since it has the hold.. Am I understanding this `Stream` concept correctly.. Please correct me if I am wrong somewhere.. – Tom Taylor Jul 13 '16 at 17:57
  • @RajasubaSubramanian In your case, then, the `InputStream` is implemented by HTTP over TCP. Neither of those underlying protocols offers support for "replaying" the data stream. A browser opens the TCP connection to your server, sends appropriate HTTP headers to describe the transmission, sends the data, and closes the TCP socket. The server no longer has any connection to the browser to request a replay, and HTTP doesn't support server-to-client requests anyway. – erickson Jul 13 '16 at 18:13
  • Well said @erickson `HTTP doesn't support server-to-client` requests - I'm convinced with this. – Tom Taylor Jul 13 '16 at 18:24
  • Here comes my another doubt : What is the difference between the HTTP header value and stream value? Because the value set in a header could be read any number of times, whereas the same does not apply for streams why? Please help me knowing this.. – Tom Taylor Jul 13 '16 at 18:26
  • @RajasubaSubramanian When you use an API like `HttpServletRequest` to read a header multiple times, the implementation has read part of the stream from the browser, and stored important pieces of information like header names and values in RAM. So you can read it over and over. But it doesn't store everything that was in the stream, like whitespace between tokens. And if the client sent too many headers, the implementation would eventually fail because it assumes only a small amount of header data, while the body can be indefinitely long. – erickson Jul 13 '16 at 18:43
  • Yeah @erickson I agree with you.. But still, without storing data if the `Stream` are used for transmitting the data, our server has to receive the data and store it somewhere, but even when the server is not ready to ready to read the stream the content is available in the stream instance throughout the connection how? Where the content would be persisted till the server reads the data? .. Please help me knowing this.. – Tom Taylor Jul 13 '16 at 18:55
  • @RajasubaSubramanian Your last question is not quite clear. I made a guess at what you meant and updated my answer. – erickson Jul 13 '16 at 19:16
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/117256/discussion-between-rajasuba-subramanian-and-erickson). – Tom Taylor Jul 13 '16 at 19:34