-1

I get the following log messages from this code, and I'm not sure why I am not getting the output in the string. Can someone advise if I am converting something incorrectly? len has a value, so I know the InputStream is good. location is also verified as good.

The files size I'm reading is 42.7 KB.

FileServices fs = new FileServices("");
InputStream in = fs.getInputStreamFromVault(location);

int len;
String strFileContents = "";

logger.info(logPrefix + "file1 ");
BufferedInputStream inBuff = new BufferedInputStream(in);
logger.info(logPrefix + "file2 ");
byte[] buf = new byte[564000];
logger.info(logPrefix + "file3 ");
int bytesRead = 0;
while ((len = inBuff.read(buf)) > 0) {
    logger.info(logPrefix + "file4 ");
    logger.info("len " + len); 
    
    strFileContents += new String(buf, 0, bytesRead); 
    logger.info("bytesRead " + bytesRead); 

    //String string = new String(buf, "UTF-8"); 
    //stream.write(buf, 0, len);
    logger.info("strFileContents " + strFileContents);
}

Log output

INFO  : [prefix] file1 
INFO  : [prefix] file2 
INFO  : [prefix] file3 
INFO  : [prefix] file4 
INFO  : len 43681
INFO  : bytesRead 0
INFO  : strFileContents 
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
user2568374
  • 1,164
  • 4
  • 11
  • 21
  • You're assigning the bytes read to `len`, but then use `bytesRead`, which is always 0 to create the string. BTW: You're approach could fail with UTF-8, because you might have a character that straddles two reads. – Mark Rotteveel Sep 29 '21 at 12:57

1 Answers1

3

The first problem in this code is that you never assigned anything to bytesRead.

But if you fix this, you still fundamentally read the text file the wrong way, because you're trying to manually decode byte[] to String in basically arbitrary chunks. That works fine if the text happens to only ever encode one character in exactly one byte. But as soon as you use a wide encoding (such as UCS-2) or a variable-length encoding (such as UTF-8 and UTF-16), this is prone to errors (which would introduce unnecessary decoding problems).

The correct way to read text from an InputStream is to wrap it in an InputStreamReader:

InputStream in = fs.getInputStreamFromVault(location);
Reader inReader = new InputStreamReader(in, StandardCharsets.UTF_8);

StringBuilder contentBuilder = new StringBuilder();
char[] buf = new char[64*1024]; //arbitrary buffer size
int charsRead;
while ((charsRead = inReader.read(buf)) > 0) {
    contentBuilder.append(buf, 0, charsRead);
}

String strFileContents = contentBuilder.toString();
logger.info("strFileContents " + strFileContents);

Note that this also replaces string concatenation with the use of a StringBuilder which probably doesn't matter for such a small file, but is still a good habit to get into, because string concatenation in a loop is a bad idea.

Last but not least this also makes sure to specify the encoding to use instead of depending on the platform default encoding.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • `Reader.read()` returns `-1` when the end of the stream is reached. –  Sep 29 '21 at 13:11
  • contentBuilder.append – user2568374 Sep 29 '21 at 13:12
  • @saka1029: that is correct. Do you propose something is wrong about my code? (Edit: fixed two typos). – Joachim Sauer Sep 29 '21 at 13:13
  • AHH, very good! Works. (after I corrected typos ;)) Very educational in input streams which is a fog to me. – user2568374 Sep 29 '21 at 13:15
  • 1
    @user2568374: the fundamental thing to remember is: for binary data (or when you don't care about the actual content of the data) use `byte[]`/`InputStream`/`OutputStream` or `ByteBuffer`. If you are handling text data use `String`/`Reader`/`Writer` or `CharBuffer`. Only ever convert between the two by providing a character set. If you don't specify one, then it's very likely a bug. – Joachim Sauer Sep 29 '21 at 13:17