0

I'm calling a webservice that returns a large response, about 59 megabytes of data. This is how I read it from Java:

        BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(),"UTF-8"));
        result = result.concat(this.getResponseText(in));

private String getResponseText(BufferedReader in) throws IOException {
    StringBuilder response = new StringBuilder(Integer.MAX_VALUE/2);

    System.out.println("Started reading");
    String line = "";
    while((line = in.readLine()) != null) {
        response.append(line);
        response.append("\n");
    }

    in.close();

    System.out.println("Done");
    String r = response.toString();
    System.out.println("Built r");

    return r;
}

In Windows Resource manager during the reading I can see a throughput of about 100000 Bytes per second.

However when I read exactly the same data from the same webservice in python, i.e.:

response = requests.request("POST", url, headers=headers, verify=False, json=json)

I can see throughput up to 700000 Bytes per second (about 7 times faster). And also the code is finished 7 times faster.

The question is - Am I missing something that can make the reads in Java faster? Is this way really the fastest way how I can read HTTP response in Java?

Update - even after I'm not reading, just going through the response, I'm still at at most 100000 bytes / seconds, so I believe that the bottleneck is somewhere in the way how Java reads:

private List<String> getResponseTextAsList(BufferedReader in) throws IOException {
    System.out.println("Started reading");

    List<String> l = new ArrayList<String>();
    int i = 0;
    long q = 0;
    String line = "";
    while((line = in.readLine()) != null) {
        //l.add(line);
        i++;
        q = q+line.length();
    }

    in.close();

    System.out.println("Done" + i + " " + q);

    return l;
}
tevemadar
  • 12,389
  • 3
  • 21
  • 49
  • I'm not sure what your specific question is. But I can confirm that many (not all) tasks can be run faster and with much less code than in Java. – Klaus D. Jul 26 '18 at 18:12
  • The specific question is - is the way how I read it in java really the best and fastest possible way to do it? – Adam Muller Jul 26 '18 at 18:22
  • You should edit your question. – Klaus D. Jul 26 '18 at 18:25
  • Might this depend on how fast the packets are arriving? Using Windows Resource manager how did you profile this (how many times, what control)? Also Java has a slow-start where Python does not. – xtratic Jul 26 '18 at 18:39
  • I’m betting you don’t need to return it as a single huge String. Whatever processing you’re performing on that response body, do it in pieces (such as one line at a time). – VGR Jul 26 '18 at 18:59
  • See update above. In the Resource manager (sorry can't post picture yet on SO) I simply went to Overview -> Network -> and looked and java.exe vs. python.exe – Adam Muller Jul 27 '18 at 14:58
  • You may rather want to measure the actual time (```System.currentTimeMillis()``` before and after the transfer). (And do the same with Python) – tevemadar Jul 27 '18 at 15:06
  • Why do you read the data line-wise in Java and not in Python? – maaartinus Jul 27 '18 at 15:16
  • What is `conn` ? – PeterMmm Jul 27 '18 at 16:27
  • conn is instance of java.net.HTTPConnection. @maaartinus how else (non data line-wise) can I read in Java - an example would be appreciated. – Adam Muller Jul 27 '18 at 18:53
  • @AdamMuller You can read bytes instead of chars. Strings in Java are internally always sort of UTF-16 coded, which adds quite some overhead (no more true in Java 9). Anyway, there's `byte[] -> char[]` conversion. You can [read arrays of chars](https://docs.oracle.com/javase/8/docs/api/java/io/BufferedReader.html#read-char:A-int-int-). – maaartinus Jul 27 '18 at 19:22
  • You're preallocating 1024 million chars, i.e., 2 GB memory, where one tenth would easily do. – maaartinus Jul 27 '18 at 19:26
  • @maaartinus I updated above, even though I'm not actually building a String or anything on my side (just reading the data and increasing counter for each line the performance is bad). Reading by chunk of bytes should not be very safe if my data contain unicode (risk of splitting a character). Unless there is a way to read all bytes at once. Anyway I gave up and switched to python for this particular task – Adam Muller Jul 27 '18 at 22:49
  • @AdamMuller Right, you're not building the string, just splitting into thrown away lines. Still quite some processing. Reading by chunks is no problem as long as you either stay in the BMP (containing even [CJK](https://en.wikipedia.org/wiki/CJK_characters)) or re-assemble the chars again before further processing. `+++` I guess, the problem you're facing is slow Java start - the code is compiled on the fly and Java is too bad for short tasks. This shouldn't matter much for something as big as 59 MB, but there's [OSR](https://stackoverflow.com/a/9105846/581205). – maaartinus Jul 28 '18 at 01:24
  • @AdamMuller AFAIK, you python code is equivalent to `byte[] json = new byte[exactLength]; inputStream.read(json);` in Java, which should be much faster. It still can be slow as some internal machinery may need to warm up, too. Repeating the task could bring you to the full speed... maybe. +++ My java web server running in a mediocre virtual machine serves easily 4 MB/s. – maaartinus Jul 28 '18 at 01:32

0 Answers0