UTF16-LE with BOM not recognizing sep in csv file

Question

I need to generate a csv through PHP in UTF16-LE to support Excel (on Windows and Mac OS X). As suggested here I used mb_convert_encoding and added the BOM at the file start, followed by sep=; in order to make it open properly on Excel.

header('Content-Type: application/csv; charset=UTF-16LE');
header('Content-Disposition: attachment; filename=export.csv');
$output = fopen('php://output', 'w');
fputs($output, mb_convert_encoding("\xEF\xBB\xBF" . "sep=;\n" . implode($labels, ";") . "\n", 'UTF-16LE', 'UTF-8'));
foreach($data as $data_line) {
    fputs($output, mb_convert_encoding(implode($data_line, ";") . "\n", 'UTF-16LE', 'UTF-8'));
}

The character encoding is ok, but when I try to open it in OpenOffice here is what I get:

The sep=;\n isn't recognized - it shouldn't be on the first line. I don't think it's a BOM issue, because when I open it with an hex editor this is what I get:

The BOM seems to be correct, as it's ÿþ which is the UTF16-LE BOM. I tried with \r\n in place of \n after sep, with no luck.

Is `sep=[value]` an OpenOffice specific directive? I can't seem to find any references to it. — Mr. Llama, Oct 21 '15 at 14:37
I don' think it's a standard as I can't find it in the CSV specification, but it's been used for Excel to force it to use `;` as separator and not tabulations (see [this](http://superuser.com/questions/647948/space-or-tabs-as-separator-in-csv) for example). But Excel doesn't recognize it too, so maybe I use it wrong. — vard, Oct 21 '15 at 14:48

Ilmari Karonen · Answer 1 · 2015-10-21T17:46:59.533

1

I can't be sure if this is the cause of your problems, but an obvious issue I see is that you haven't encoded the sep=;\n string as UTF-16LE.

To fix this, change your first fputs() line to:

$bom = "\xEF\xBB\xBF";
$header = $bom . "sep=;\n" . implode($labels, ";") . "\n";
fputs($output, mb_convert_encoding($header, 'UTF-16LE', 'UTF-8'));

(The string \xEF\xBB\xBF is the Unicode Byte Order Mark in the UTF-8 encoding; it will yield the correct BOM when converted to UTF-16.)

edited Oct 21 '15 at 17:46

answered Oct 21 '15 at 17:41

Ilmari Karonen

49,047
9
93
153

1

Ah it was a good catch, but sadly I end up with the same result (using the UTF-8 BOM and wrap all with `mb_convert_encoding` - I updated the question code to reflect this). – vard Oct 22 '15 at 07:26

UTF16-LE with BOM not recognizing sep in csv file

1 Answers1