I read the string from file with encoding "UTF-8". And I need to match it to a expression.
The first character of the file is #
, but in the string the first is ''
(empty symbol). I have translated it into bytes with charset "UTF-8", here it is [-17, -69, -65]
. Does anyone know what is it and how to solve it with regexprs?
Asked
Active
Viewed 1,359 times
5

pablosaraiva
- 2,343
- 1
- 27
- 38

itun
- 3,439
- 12
- 51
- 75
-
1Can paste [hexdump](http://en.wikipedia.org/wiki/Hex_dump) of start of file? That is, the raw data before Java even touches it. – Sep 21 '11 at 04:25
1 Answers
8
Some editors (like notepad) adds BOM (byte order mask) signature when saved UTF-8 text. You should check 0xEF, 0xBB, 0xBF bytes before read string from such file and skip them if they exists.
Another way is do not use notepad for editing UTF-8 texts, get other program like Notepad++, Kate or whatever with witch you can control adding BOM.

Yarg
- 5,586
- 2
- 17
- 22