java.lang.NumberFormatException for input string "1"

Question

So, I have an issue that really bothers me. I have a simple parser that I made in java. Here is the piece of relevant code:

while( (line = br.readLine())!=null)
{
    String splitted[] = line.split(SPLITTER);
    int docNum = Integer.parseInt(splitted[0].trim());
    //do something
}

Input file is CSV file, the first entry of the file being an integer. When I start parsing, I immidiately get this exception:

Exception in thread "main" java.lang.NumberFormatException: For input string: "1"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at dipl.parser.TableParser.parse(TableParser.java:50)
at dipl.parser.DocumentParser.main(DocumentParser.java:87)

I checked the file, it indeed has 1 as its first value (no other characters are in that field), but I still get the message. I think that it may be because of file encoding: it is UTF-8, with Unix endlines. And the program is run on Ubuntu 14.04. Any suggestions where to look for the problem are welcome.

Nice one using copy and paste to put the error in the question! — T.J. Crowder, Sep 26 '16 at 11:11

score 38 · Accepted Answer · edited May 23 '17 at 12:00

38

You have a BOM in front of that number; if I copy what looks like "1" in your question and paste it into vim, I see that you have a FE FF (e.g., a BOM) in front of it. From that link:

The exact bytes comprising the BOM will be whatever the Unicode character U+FEFF is converted into by that transformation format.

So that's the issue, consume the file with the appropriate reader for the transformation (UTF-8, UTF-16 big-endian, UTF-16 little-endian, etc.) the file is encoded with. See also this question and its answers for more about reading Unicode files in Java.

edited May 23 '17 at 12:00

Community

1
1

answered Sep 26 '16 at 11:11

T.J. Crowder

1,031,962
187
1,923
1,875

1

@Doval: **Thank you,** I was absolutely wrong to say it was a UTF-8 BOM, and you're quite right that on-the-wire, the BOM for UTF-8 is EF BB BF. But what we're looking at is the *end result* of reading the file and then seeing the output in the error message. The file might be in any transformation; all BOMs end up being FE FF *once read*. – T.J. Crowder Sep 26 '16 at 18:12
But if it was read *raw*, then...oh, I don't know. :-) Could well have been UTF-16. :-) It'll all depend on how the file was read into the stream. – T.J. Crowder Sep 26 '16 at 18:29
1

"all BOMs end up being FE FF once read" - Not quite. All BOMs end up being U+FEFF (which is not the same as 0xFE 0xFF since it's a code point rather than a sequence of bytes) once *decoded*. Before decoding, all you have is bytes, which may be in any encoding that can represent Unicode characters (mostly UTF-8 and UTF-16 but others exist). – Kevin Sep 26 '16 at 19:59
@Kevin: Yes, that's what I meant. – T.J. Crowder Sep 27 '16 at 02:32

java.lang.NumberFormatException for input string "1"

1 Answers1

Linked

Related