Java adds spaces when reading in a line?

Question

So I'm at my wit's end with this program. I'm reading in from a text file in Java. Barring everything that I do with the string once I have it, this is the bare minimum code to be shown.

            while ((lineIn = myReader.readLine()) != null) {
                System.out.println("LineIn: \""+lineIn+"\"");
                System.out.println("Length: "+lineIn.length());
            }

What it prints out, however, is very strange indeed. The line should read:

001 2014/06/09 09:40:24 0.000

But this is what I get:

LineIn: "�2�6�1�8� �2�0�1�4�/�0�7�/�1�0� �2�3�:�1�5�:�0�3� �0�.�0�0�0�" Length: 61

On Stack Overflow it actually shows up fine. You may be able to copy and paste the "LineIn: etc" into your address bar and see there are little invisible spaces in the numbering. I have no idea why those are there, what they are, and where Java is getting them from. Opening the document it's sourced from in a simple text editor shows no such spacing, and copy+pasting from the text editor into the browser address bar has no superfluous spacing either. It's very peculiar and I hope someone can offer insight. I'm pulling out my hair here.

Just for you to know before I fix it : Java and Javascript are as different as moon and Earth. — Dici, Oct 04 '14 at 19:14
@Dici: Or as similar as car and carpet: http://stackoverflow.com/a/245068/367273 — NPE, Oct 04 '14 at 19:17
are you using `Scanner`? Scanner is a far simpler and easier choice. `Scanner` will infer the file's encoding automatically based on the BOM at the beginning of your file. — SnakeDoc, Oct 04 '14 at 19:18

score 4 · Answer 1 · answered Oct 04 '14 at 19:16

4

It could be due to the formatting and encoding that your reader is using, try using Scanner instead.

answered Oct 04 '14 at 19:16

SamTebbs33

5,507
3
22
44

Jerry101 · Accepted Answer · 2014-10-04T19:35:29.570

4

It looks like you're reading UTF-16 data as if it had an 8-bit encoding.

If you construct a java.io.InputStreamReader, you can specify the input text charset such as "UTF-16".

edited Oct 04 '14 at 19:35

answered Oct 04 '14 at 19:18

Jerry101

12,157
5
44
63

2

to be complete, you should provide a way for him to fix his encoding. Namely, using `Scanner` would infer the encoding automatically based on the BOM at the beginning of the file. – SnakeDoc Oct 04 '14 at 19:21
1

Good point, @SnakeDoc. One can use the charset "UTF-16" to force 16-bit decoding, and it'll read an optional byte-order mark to distinguish big vs. little endian. The doc for `java.util.Scanner` http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#Scanner(java.io.File,%20java.lang.String) says defaulting the charset uses "the underlying platform's default charset." It doesn't say it'll automatically pick from UTF-16 vs. UTF-8. – Jerry101 Oct 04 '14 at 19:31
1

Ended up using this and it worked. Thanks :) `InputStreamReader fileInputStreamReader = new InputStreamReader(fileInStream, "UTF-16");` – Rob Oct 04 '14 at 19:34

score 2 · Answer 3 · answered Oct 04 '14 at 19:18

2

Java certainly is not doing that, it might be UTF-16 encoded file. Can you upload the file or a small part of it somewhere?

answered Oct 04 '14 at 19:18

Anti Veeranna

11,485
4
42
63

Java adds spaces when reading in a line?

3 Answers3