3

I'm trying to read from an input stream of a HttpURLConnection:

InputStream input = conn.getInputStream();
InputStreamReader isr = new InputStreamReader((input));
BufferedReader br = new BufferedReader(isr);

StringBuilder out = new StringBuilder("");
String output;
while ((output = br.readLine()) != null) {
    out.append(output);
}

This does take too much time when the input stream contains a lot of data. Is it possible to optimize this?

yo_haha
  • 359
  • 1
  • 4
  • 15
  • 10
    What makes you think the bottleneck is in your code, vs the time taken for the data to arrive over the network? – Jon Skeet Jan 20 '15 at 16:50
  • You can test with larger/smaller buffer sizes for your `BufferedReader`. You're otherwise limited by the network. – Sotirios Delimanolis Jan 20 '15 at 16:51
  • Is "too much" actually more than when downloading the resource with other means (ftp, browser, wget etc.)? Measure, don't guess. How many KB/s do you expect and how many do you get? Note that constructing huge StringBuilders may take a lot of time. There's a lot of array copying and memory allocating if it's all in-memory. – Petr Janeček Jan 20 '15 at 16:54
  • Apart from what's been mentioned before, compression might be an option but of course that has to be implemented on both sides of the connection. – biziclop Jan 20 '15 at 16:56
  • Storing this in a StringBuilder is OK, but you should provide an initial allocation according to what you expect. Otherwise the extend/copy happens when the size exceeds every power of 2. – laune Jan 20 '15 at 16:56
  • @JonSkeet Don't you think that the StringBuilder reallocation strategy might throw a spanner into the works? – laune Jan 20 '15 at 16:58
  • Also, reading line by line is a slow way of reading bulk data. – laune Jan 20 '15 at 16:59
  • @laune In theory it could but the only way to know is to measure. – biziclop Jan 20 '15 at 16:59
  • 1
    @laune: I think it's unlikely to be slower than the network... – Jon Skeet Jan 20 '15 at 17:00
  • 1
    @biziclop I have actually seen the double/copy to slow things down. It's one thing to check - therefore a comment not an answer;-) – laune Jan 20 '15 at 17:01
  • Process the data as you read it and don't discard new lines, they might be useful for something. ;) – Peter Lawrey Jan 20 '15 at 17:16
  • @yo_haha How much is "a lot" really? – laune Jan 20 '15 at 17:21
  • I made some measures. Without reading data from the stream (without the while loop), it takes 12 to 13 seconds to execute the code. With the while loop, it takes from 50 to 70 seconds. Using the BufferedReader makes execution way slower. – yo_haha Jan 20 '15 at 21:52
  • The json object I get from the request contains really a lot of data. – yo_haha Jan 20 '15 at 21:54

3 Answers3

3

Maybe this will be a bit faster, cause the new Stream API in Java 8 ist using internaly a parallel mechanism:

package testing;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.stream.Stream;

public class StreamTest {

  /**
   * @param args the command line arguments
   * @throws java.io.IOException
   */
  public static void main(String[] args) throws IOException {
    URL url = new URL("http://www.google.com");
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setUseCaches(false);
    if (conn.getResponseCode() == HttpURLConnection.HTTP_OK) {
      BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));

      Stream<String> s = br.lines();
      s.parallel().forEach(System.out::println);      
    }
  }

}
aw-think
  • 4,723
  • 2
  • 21
  • 42
  • Then you probably have the fastet implementation already done. Maybe concurrent read will be an alternative: http://stackoverflow.com/questions/16159183/java-concurrent-reads-on-an-inputstream and this will be helpful if you may decide to switch to Java 8: http://stackoverflow.com/questions/1605332/java-nio-filechannel-versus-fileoutputstream-performance-usefulness – aw-think Jan 21 '15 at 07:41
  • Thanks for the tips. In my code, bufferedReader.readLine() is the one that takes the most of execution time – yo_haha Jan 21 '15 at 11:21
  • It's not faster, in fact. It still cost about 10s for 20kb of string – Slim_user71169 Aug 27 '15 at 03:59
0

There's nothing slow about this code. You can read millions of lines a second with this, if the input arrives fast enough. Your time probably isn't spent reading the input stream at all, but in either blocking waiting for input or in appending to the StringBuilder.

But you shouldn't be doing this at all. Most files can be processed a line at a time or a record at a time. Compilers process them a token at a time, and there aren't many more complex file-processing tasks than compilation. It's possible.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 1
    If you say "time...is spent...in appending to the StringBuilder": aren't you contradicting your statement "There's nothing slow about this code."? – laune Jan 20 '15 at 17:21
  • I made some measures: waiting for the input takes 12-13 seconds and reading the stream + building the StringBuilder takes 40-55 seconds. – yo_haha Jan 20 '15 at 22:00
0

In java inputstream we have method read(byte b[],off,len) which reads the from the input stream into the given byte array. Here off is the starting index of the array, len is the maximum number of byte to be read and b[] is the byte array. Read method will attempt to read maximum of len number of bytes but this method returns number of actual byte read as many times i will fail to read the desired number of bytes. Here is the example:-

FileInputStream i=new FileInputStream("file path");
FIleOutputStream o=new FileOutputStream("file path");
byte a[]=new byte[1024];
for(int j;(j=i.read(a,0,1024))!=-1;){
    o.write(a,0,j);
}