Java: UTF-8 and BOM

Question

On a page of Java's Bug Database http://bugs.sun.com/view_bug.do?bug_id=4508058 it reads that Sun/Oracle will not fix the problem of Java not parsing the BOM of a UTF-8-encoded string. Since the most recent comment on this page dates back to 2010, I would like to know if there is any younger info about that? Is it still true that Java cannot handle BOM of UTF-8?

I disagree with how how you have stated the issue, but yes: the Java encoder and decoder for UTF-8 of course makes no allowance for a superfluous BOM. BOMs on UTF-8 are really bad news, and break all kinds of things. Please never use them; if you find yourself needing to specify the file encodings, then use a higher level protocol, such as MIME headers, an embedded declaration or comment in whatever programming language it is, or via the customary file extension “.utf8”. — tchrist, Mar 26 '12 at 16:59
I agree with you. However, if you want to create a UTF-8 csv file that users can open directly in Excel, then there is no way around BOM. If you don't use a BOM, it will read it as an ANSI file. (Microsoft should be sued for all the development hours their BOM has cost the world) — dstibbe, Jun 08 '12 at 13:51

score 6 · Accepted Answer · answered Mar 27 '12 at 08:24

6

Yes, it is still true that Java cannot handle the BOM in UTF8 encoded files. I came across this issue when parsing several XML files for data formatting purposes. Since you can't know when you might come across them, I would suggest stripping the BOM marker out if you find it at runtime or following the advice that tchrist gave.

answered Mar 27 '12 at 08:24

Ocracoke

1,718
3
24
38

If you're at a loss about how to do that: a quick way is `if (text.codePointAt(0) == 0xfeff) text = text.substring(1, text.length());` (this will also catch the UTF-8 BOM `EF BB BF`). A more elaborate approach is described at: http://stackoverflow.com/questions/1835430/byte-order-mark-screws-up-file-reading-in-java/1835529#1835529 – user149408 May 24 '15 at 14:21

Java: UTF-8 and BOM

1 Answers1

Linked

Related