0

I wrote a utility method to write some small data from a stream to a String.

Which implementation has more performance?

  1. Write all data to a byte array and then convert all of them to String at once.

OR

  1. Convert each buffered part to String and concatenate them.

Implementation 1:

private String fileToString() throw ... {
    final byte[] buffer = new byte[bufLen];
    int n;
    
    final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    while ((n = fileInputStream.read(buffer)) != -1)
        byteArrayOutputStream.write(buffer, 0, n);
    
    return new String(byteArrayOutputStream.toByteArray(), "UTF-8");
}

Implementation 2:

private String fileToString() throw ... {
    final byte[] buffer = new byte[bufLen];
    int n;
    
    final StringBuilder stringBuilder = new StringBuilder(aProperValue);
    while ((n = fileInputStream.read(buffer)) != -1)
        stringBuilder.append(new String(buffer, 0, n, "UTF-8"));
    
    return stringBuilder.toString();
}

EDIT:

The second implementation is not correct! I was wrong! See my answer below.

Mir-Ismaili
  • 13,974
  • 8
  • 82
  • 100
  • @ThomasS. Perhaps will do it. But it would be good to make it **accessible for all on the web**. – Mir-Ismaili Dec 18 '17 at 20:36
  • 7
    The second one is incorrect: it might read half a character (since UTF8 encodes some characters to several bytes) and try to transform this half sequence to a character. Why don't you use a Reader to read characters? That's what they're for. – JB Nizet Dec 18 '17 at 20:37
  • 1
    Don't forget to read about [how to write a correct micro-benchmark](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) before you try it. – azurefrog Dec 18 '17 at 20:38
  • https://javapapers.com/java/java-string-vs-stringbuilder-vs-stringbuffer-concatenation-performance-micro-benchmark/ – Maytham Fahmi Dec 18 '17 at 20:40
  • I think you'll likely find that for small strings, both approaches perform about the same. This is probably a micro-optimization. – cdhowie Dec 18 '17 at 20:47
  • What are you reading *from*? The performance of your IO device(s) will likely dominate the performance of your entire application. – Andrew Henle Dec 18 '17 at 21:00
  • @JBNizet. No. From the documentation of `String (byte[], int, int, String)`: "The length of the new String is a function of the charset, and hence may not be equal to the length of the subarray". Also I tested it: `new String("αβγδε".getBytes(), 0, "αβγδε".getBytes().length, "UTF-8")` is equal to `"αβγδε"`. – Mir-Ismaili Dec 18 '17 at 21:01
  • 1
    @Mir-Ismaili `fileInputStream.read(buffer)` reads *bytes*, not *UTF-8 characters*. `fileInputStream.read(buffer)` can indeed split a UTF-8 character into multiple bytes. – Andrew Henle Dec 18 '17 at 21:06
  • 2
    @Mir-Ismaili you didn't get me. Suppose your String has two chars. Suppose encoding the first char to UTF8 gives the bytes [192, 128], and the encoding of the second gives [193, 129]. Now suppose that, when reading these bytes, you first get [192, 128, 193]. You'll transform these three bytes to a String, thus trying to decode the byte 193 as a character, which is invalid. – JB Nizet Dec 18 '17 at 21:07
  • @JBNizet. OK. Very thank you. The issue has been cleared! There is only one way. I was wrong. – Mir-Ismaili Dec 18 '17 at 21:13

2 Answers2

1

The second implementation is wrong. It doesn't work at boundaries! Thank @JB Nizet and @Andrew Henle. See their comments under my question.

Mir-Ismaili
  • 13,974
  • 8
  • 82
  • 100
0

Best way is to use some library. I've tried to solve same issue by myself, but performance was really slow. Consider using CharStreams.toString from Guava library, you'll see increased performance with naked eye.

krund
  • 740
  • 1
  • 7
  • 16