1

I am querying an api over http that returns a huge response. This response is obtained in the form of an InputStream. It turns out merely reading the characters iteratively is quite slow, for example approximately 16,5s for 750k characters.

On the other hand, if I create a 750k characters long string, and get a stream from new ByteArrayInputStream(string.getBytes()), and iteratively call read() until it returns -1, it will do so in approximately 0,02s.

I'm currently calling IOUtils.toString but my attempts of doing it manually (with the use of a BufferedReader of InputStreamReader) yield the same result.

String Builder output Buffer = new StringBuilder(750000);
for (int i = 0; i < 750000; i++) { 
   outputBuffer.append(i%100==0?"\n":"a");
}
String myString = outputBuffer.toString();
InputStream theStreamIBuilt = new ByteArrayInputStream(myString.getBytes());
long start = nanoTime();
String body = IOUtils.toString(theStreamIBuilt, encoding);
System.out.println("Time : "+((double)(nanoTime() - start)) /1000000000);
System.out.println("Number of chars :" +body.length ());

This yields 0,02s.

URLConnection con = (new URL(url)).openConnection();
InputStream theStreamIReceive = con.getInputStream();
long start = nanoTime();
String body = IOUtils.toString(theStreamIReceive, encoding);
System.out.println("Time : " + ((double) (nanoTime()-start) /1000000000);
System.out.println("Number of chars :" +body.length());

This yields 16,74s. The same size was used.

There's something I'm fundamentally missing here most likely regarding the nature of the InputStream http response. What is it? Where does the speed difference come from?

Julien BERNARD
  • 653
  • 6
  • 17
  • Please create a [proper](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) benchmark – Lino Jul 04 '18 at 11:11
  • 1
    Err, the network I/O? Your first example doesn't read anything whatsoever. It just creates a string of lines containing nothing but a. Why shouldn't it be faster than network I/O? Your comparision is totally invalid. – user207421 Jul 04 '18 at 11:27
  • Obviously it is slower to read from the network than from memory and that's why you observe one being slower than the other. But also note that when you want to read each byte separately with read(), the performances will be way better if you wrap your InputStream with a BufferedInputStream. Of course, it would be even more efficient to read the bytes in bulk rather than one by one. – kumesana Jul 04 '18 at 12:50
  • Thank you for your answers, it seems I'm misunderstanding something : when I store the initial time, I'm assuming the http response has come back and is somewhere in memory, waiting to be read. In other words, that the network I/O work happened during the openConnection() and con.getInputStream() calls (which take 0,25s to execute by the way). Assuming this is correct, I don't get what the difference is between my two cases. Where do I understand it wrong? – Julien BERNARD Jul 04 '18 at 13:23
  • 1
    Your assume was wrong. The server maybe started transfer data to the client before you called `con.getInputStream()`. But when you called `start = nanoTime();`, only small amount or none of the data was received by the client. – SKi Jul 04 '18 at 13:48
  • @SKi Thanks, I get it. I assumed the connection calls were synchronous. So the read calls are actually delayed by the data not being available yet, and the asynchronous nature explains that an InputBuffer is returned. – Julien BERNARD Jul 04 '18 at 13:59
  • They *are* synchronous. You issue a `read()`: a read happens. Your assumption was that everything had been transferred by the time `getInputStream()` returned. You're making a mountain out of a molehill here. – user207421 Jul 04 '18 at 20:32

0 Answers0