Reading InputStream into byte array and then converting to String problem

Question

I get the following log messages from this code, and I'm not sure why I am not getting the output in the string. Can someone advise if I am converting something incorrectly? len has a value, so I know the InputStream is good. location is also verified as good.

The files size I'm reading is 42.7 KB.

FileServices fs = new FileServices("");
InputStream in = fs.getInputStreamFromVault(location);

int len;
String strFileContents = "";

logger.info(logPrefix + "file1 ");
BufferedInputStream inBuff = new BufferedInputStream(in);
logger.info(logPrefix + "file2 ");
byte[] buf = new byte[564000];
logger.info(logPrefix + "file3 ");
int bytesRead = 0;
while ((len = inBuff.read(buf)) > 0) {
    logger.info(logPrefix + "file4 ");
    logger.info("len " + len); 
    
    strFileContents += new String(buf, 0, bytesRead); 
    logger.info("bytesRead " + bytesRead); 

    //String string = new String(buf, "UTF-8"); 
    //stream.write(buf, 0, len);
    logger.info("strFileContents " + strFileContents);
}

Log output

INFO  : [prefix] file1 
INFO  : [prefix] file2 
INFO  : [prefix] file3 
INFO  : [prefix] file4 
INFO  : len 43681
INFO  : bytesRead 0
INFO  : strFileContents

You're assigning the bytes read to `len`, but then use `bytesRead`, which is always 0 to create the string. BTW: You're approach could fail with UTF-8, because you might have a character that straddles two reads. — Mark Rotteveel, Sep 29 '21 at 12:57

Joachim Sauer · Answer 1 · 2021-09-29T13:12:37.150

3

The first problem in this code is that you never assigned anything to bytesRead.

But if you fix this, you still fundamentally read the text file the wrong way, because you're trying to manually decode byte[] to String in basically arbitrary chunks. That works fine if the text happens to only ever encode one character in exactly one byte. But as soon as you use a wide encoding (such as UCS-2) or a variable-length encoding (such as UTF-8 and UTF-16), this is prone to errors (which would introduce unnecessary decoding problems).

The correct way to read text from an InputStream is to wrap it in an InputStreamReader:

InputStream in = fs.getInputStreamFromVault(location);
Reader inReader = new InputStreamReader(in, StandardCharsets.UTF_8);

StringBuilder contentBuilder = new StringBuilder();
char[] buf = new char[64*1024]; //arbitrary buffer size
int charsRead;
while ((charsRead = inReader.read(buf)) > 0) {
    contentBuilder.append(buf, 0, charsRead);
}

String strFileContents = contentBuilder.toString();
logger.info("strFileContents " + strFileContents);

Note that this also replaces string concatenation with the use of a StringBuilder which probably doesn't matter for such a small file, but is still a good habit to get into, because string concatenation in a loop is a bad idea.

Last but not least this also makes sure to specify the encoding to use instead of depending on the platform default encoding.

edited Sep 29 '21 at 13:12

answered Sep 29 '21 at 12:57

Joachim Sauer

302,674
57
556
614

`Reader.read()` returns `-1` when the end of the stream is reached. – Sep 29 '21 at 13:11
contentBuilder.append – user2568374 Sep 29 '21 at 13:12
@saka1029: that is correct. Do you propose something is wrong about my code? (Edit: fixed two typos). – Joachim Sauer Sep 29 '21 at 13:13
AHH, very good! Works. (after I corrected typos ;)) Very educational in input streams which is a fog to me. – user2568374 Sep 29 '21 at 13:15
1

@user2568374: the fundamental thing to remember is: for binary data (or when you don't care about the actual content of the data) use `byte[]`/`InputStream`/`OutputStream` or `ByteBuffer`. If you are handling text data use `String`/`Reader`/`Writer` or `CharBuffer`. Only ever convert between the two by providing a character set. If you don't specify one, then it's very likely a bug. – Joachim Sauer Sep 29 '21 at 13:17

Reading InputStream into byte array and then converting to String problem

1 Answers1