26

how can I force PHP to add the BOM when using utf8_encode ?

Here's what I am trying to do:

$zip->addFromString($filename, utf8_encode($xml));

Unfortunately (for me), the result will not have the BOM mark at the beginning.

Devin Burke
  • 13,642
  • 12
  • 55
  • 82
Jeano
  • 263
  • 1
  • 3
  • 4

1 Answers1

78

Have you tried adding one yourself?

The UTF-8 BOM seems to be 0xEF 0xBB 0xBF, so you can attach it to your string after conversion to UTF-8.

$utf8_with_bom = chr(239) . chr(187) . chr(191) . $utf8_string;

Watch out, though. utf8_encode wants an ISO-8859-1 string. If you're working with XML, make sure that the XML isn't already UTF-8 encoded. The comments on the documentation suggest that the function is broken in a variety of fun ways, so you shouldn't throw it around unless you know that you need it.

Remember, PHP strings are simply dumb, unknowing bytes. They don't have a character set attached to them, so if the data in the string is already UTF-8, you don't need to run the conversion.

Also, the linked Wikipedia article says this:

While Unicode standard allows BOM in UTF-8, it does not require or recommend it. Byte order has no meaning in UTF-8 so a BOM only serves to identify a text stream or file as UTF-8 or that it was converted from another format that has a BOM.

You probably don't need to bother with the BOM tapdance to begin with.

Charles
  • 50,943
  • 13
  • 104
  • 142
  • 16
    I had a problem where Excel wouldn't open my UTF-8 CSV correctly without the BOM so it may not be required but it certainly can make a difference. – OrganicPanda Jul 05 '13 at 08:46
  • 9
    You can make the number seem less "magical" by doing `chr(0xEF).chr(0xBB).chr(0xBF)` - this way you can see that it's hex, and from there understand better that it's the BOM. – Niet the Dark Absol Jul 18 '14 at 15:37
  • If you use some old editor, e.g. EditPlus, then 'find in file' function can only search and recognize file with foreign characters encoded in utf8+bom. – Scott Chu Apr 19 '16 at 09:03
  • 2
    Keep in mind that for the .CSV file to work in Excel for Mac, UTF8 BOM and encoding won't work - you need to convert your data to UTF16-LE *and* add a UTF16-LE BOM - http://stackoverflow.com/a/16766198/324220 – Luka Ramishvili Mar 30 '17 at 13:57
  • I pledge to you my firstborn child. Thank you. – Kenny Wyland Aug 03 '18 at 20:25