2

So I'm at my wit's end with this program. I'm reading in from a text file in Java. Barring everything that I do with the string once I have it, this is the bare minimum code to be shown.

            while ((lineIn = myReader.readLine()) != null) {
                System.out.println("LineIn: \""+lineIn+"\"");
                System.out.println("Length: "+lineIn.length());
            }

What it prints out, however, is very strange indeed. The line should read:

001 2014/06/09 09:40:24 0.000

But this is what I get:

LineIn: "�2�6�1�8� �2�0�1�4�/�0�7�/�1�0� �2�3�:�1�5�:�0�3� �0�.�0�0�0�" Length: 61

On Stack Overflow it actually shows up fine. You may be able to copy and paste the "LineIn: etc" into your address bar and see there are little invisible spaces in the numbering. I have no idea why those are there, what they are, and where Java is getting them from. Opening the document it's sourced from in a simple text editor shows no such spacing, and copy+pasting from the text editor into the browser address bar has no superfluous spacing either. It's very peculiar and I hope someone can offer insight. I'm pulling out my hair here.

Dici
  • 25,226
  • 7
  • 41
  • 82
Rob
  • 1,045
  • 13
  • 28

3 Answers3

4

It could be due to the formatting and encoding that your reader is using, try using Scanner instead.

SamTebbs33
  • 5,507
  • 3
  • 22
  • 44
4

It looks like you're reading UTF-16 data as if it had an 8-bit encoding.

If you construct a java.io.InputStreamReader, you can specify the input text charset such as "UTF-16".

Jerry101
  • 12,157
  • 5
  • 44
  • 63
  • 2
    to be complete, you should provide a way for him to fix his encoding. Namely, using `Scanner` would infer the encoding automatically based on the BOM at the beginning of the file. – SnakeDoc Oct 04 '14 at 19:21
  • 1
    Good point, @SnakeDoc. One can use the charset "UTF-16" to force 16-bit decoding, and it'll read an optional byte-order mark to distinguish big vs. little endian. The doc for `java.util.Scanner` http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#Scanner(java.io.File,%20java.lang.String) says defaulting the charset uses "the underlying platform's default charset." It doesn't say it'll automatically pick from UTF-16 vs. UTF-8. – Jerry101 Oct 04 '14 at 19:31
  • 1
    Ended up using this and it worked. Thanks :) `InputStreamReader fileInputStreamReader = new InputStreamReader(fileInStream, "UTF-16");` – Rob Oct 04 '14 at 19:34
2

Java certainly is not doing that, it might be UTF-16 encoded file. Can you upload the file or a small part of it somewhere?

Anti Veeranna
  • 11,485
  • 4
  • 42
  • 63