0

I'm wondering what the objections are to using what I'll call the 'String constructor method' to convert an InputStream into a String.

Edit: added emphasis. In particular, I'm wondering why we have to mess with Streams and Buffers and Scanners and whatnot when this method seems to work fine.

private String readStream(InputStream in) {
    byte[] buffer = new byte[1024];
    try {
        return new String(buffer, 0, in.read(buffer));
    } catch (IOException e) {
        Log.d(DEBUG_TAG, "Error reading input stream!");
        return "";
    }
}

I've seen this other helpful post and tried the methods I could:

  • Method 1, Apache commons, is a no-go, since I can't use and don't want libraries right now.
  • Method 2, The Scanner one, looks promising, but then you'd have to be able to set delimiters in the stream, which isn't always possible, right? E.g. right now I'm using an InputStream from a web API.
  • Method 3, the InputStreamReader in the slurp function, didn't work either - it gives me a bunch of numbers, where I'm sending a string with all types of characters, so I may be messing something up in my encoding.

But after many Google searches, I finally found the String constructor method, which is the only one that works for me.

From comments on the thread I linked, I know there are issues with encoding in the method I'm using. I've been coding for a while now and know what encodings are and why they're around. But I still lack any knowledge about what kinds of encodings are used where, and how to detect and handle them. Any resources/help on that topic would also be very appreciated!

Community
  • 1
  • 1
bbill
  • 2,264
  • 1
  • 22
  • 28
  • 1
    The `InputStream` is not required to read all the bytes, so depending on what you are trying to achieve, it may not be possible like this. – Sotirios Delimanolis Apr 07 '14 at 18:00
  • 2
    As a general rule, just use UTF-8. Always. No exceptions. Do it. – Louis Wasserman Apr 07 '14 at 18:00
  • You have the option to slurp all bytes into a `ByteArrayOutputStream`, then try and decode all the bytes using a `CharsetDecoder`. If successful, the `CharBuffer`'s `.toString()` method will be your string. Note that using a `CharsetDecoder` configured the "ass tight" way, you can detect errors: by default the string constructor _replaces_ unmappable byte sequences. – fge Apr 07 '14 at 18:34

1 Answers1

1

Here is one method using only standard libraries:

  • use a ByteArrayOutputStream and copy all the bytes you receive in it;
  • wrap this ByteArrayOutputStream's bytes into a ByteBuffer;
  • use a CharsetDecoder to decode the ByteBuffer into a CharBuffer;
  • .toString() the CharBuffer after rewinding it.

Code (note: doesn't handle closing the input):

// Step 1: read all the bytes
final ByteArrayOutputStream out = new ByteArrayOutputStream();
final byte[] buffer = new byte[8196];

int count;

while ((count = in.read(buffer)) != -1)
    out.write(buf, 0, count);

// Step 2: wrap the array
final ByteBuffer byteBuffer = ByteBuffer.wrap(out.toByteArray());

// Step 3: decode
final CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder()
    .onUnmappableCharacter(CodingErrorAction.REPORT)
    .onMalformedInput(CodingErrorAction.REPORT);

final CharBuffer charBuffer = decoder.decode(byteBuffer);

charBuffer.flip();
return charBuffer.toString();
fge
  • 119,121
  • 33
  • 254
  • 329
  • Thank you for the thorough answer! I'm a little unfamiliar with Streams and Buffers, though, so I can't see what the advantage of using the ByteArrayOutputStream and ByteBuffer over reading into a Byte[] and using the String constructor is. Does this have to do with what @Sotiros mentioned in his comment on my question? (Edit: I've added some emphasis in my question) – bbill Apr 07 '14 at 19:18
  • That and also the fact that by default the `String` constructor will, as I commented, replace sequences of bytes it cannot map. You can obtain the same behaviour from a `CharsetDecoder` by setting the `CodingErrorAction`s to `REPLACE` (this is what `String` does) – fge Apr 07 '14 at 19:19
  • All right. I've finally run into a problem just as @Sotiros suggested. One last question, though - it would be pretty much equivalent, besides the byte replacement, if I looped and copied every String I got from the InputStream until it returned -1, right? (I think that may have been a method in the link, actually) – bbill Apr 08 '14 at 06:21
  • 1
    No, it wouldn't be equivalent; say you have a 1024 byte buffer, and at offset 1022 (note: offset starting from 0) you have a 4 byte UTF-8 sequence. You'd botch it if you converted those 1024 bytes immediately. As to ensure a full read, you can use nio's AsynchronousChannel – fge Apr 08 '14 at 06:24
  • Makes sense! `StandardCharsets.UTF-8` isn't available until API level 19, though (Java 7). I'm using ByteArrayOutputStream.toByteArray and a string constructor for now :) – bbill Apr 08 '14 at 06:41
  • 1
    `StandardCharsets.UTF_8` may not be available but `Charset.forName("UTF-8")` is ;) – fge Apr 08 '14 at 06:47