Right way to deal with Unicode BOM in a text file

Question

I am reading a text file in my program which contains some Unicode BOM character \ufeff/65279 in places. This presents several issues in further parsing.

Right now I am detecting and filtering these characters myself but would like to know if Java standard library or Guava has a way to do this more cleanly.

In _places_? The BOM should be the first bytes of a file; otherwise it isn't a BOM. — Boris the Spider, Apr 13 '13 at 08:43
Assuming that the BOM is at the start of the file then [this](http://code.google.com/p/guava-libraries/issues/detail?id=345&colspec=ID%20Type%20Status%20Milestone%20Summary) bug report of the Guava website explains that Guava doesn't handle BOM and [this](http://stackoverflow.com/questions/9736999/how-to-remove-bom-from-an-xml-file-in-java) post gives an idea on how to skip it in plain Java. — Boris the Spider, Apr 13 '13 at 08:51
@bmorris591, yes, in the beginning. Thanks. If you post your 2nd comment as an answer, I will mark it accepted. — missingfaktor, Apr 13 '13 at 09:29

score 10 · Accepted Answer · edited May 23 '17 at 12:07

There is no built in way of dealing with a (UTF-8) BOM in Java or, indeed, in Guava.

There is currently a bug report on the Guava website about dealing with a BOM in Guava IO.

There are several SO posts (here and here) on how to detect/skip the BOM while reading a file in plain Java.

Your BOM (\ufeff) seems to be UTF-16 which, according to the same Guava report should be dealt with automatically by Java. This SO post seems suggest the same.

Right way to deal with Unicode BOM in a text file

1 Answers1