3

I need to generate a csv through PHP in UTF16-LE to support Excel (on Windows and Mac OS X). As suggested here I used mb_convert_encoding and added the BOM at the file start, followed by sep=; in order to make it open properly on Excel.

header('Content-Type: application/csv; charset=UTF-16LE');
header('Content-Disposition: attachment; filename=export.csv');
$output = fopen('php://output', 'w');
fputs($output, mb_convert_encoding("\xEF\xBB\xBF" . "sep=;\n" . implode($labels, ";") . "\n", 'UTF-16LE', 'UTF-8'));
foreach($data as $data_line) {
    fputs($output, mb_convert_encoding(implode($data_line, ";") . "\n", 'UTF-16LE', 'UTF-8'));
}

The character encoding is ok, but when I try to open it in OpenOffice here is what I get:

Open Office UTF16-LE

The sep=;\n isn't recognized - it shouldn't be on the first line. I don't think it's a BOM issue, because when I open it with an hex editor this is what I get:

UTF16-LE BOM

The BOM seems to be correct, as it's ÿþ which is the UTF16-LE BOM. I tried with \r\n in place of \n after sep, with no luck.

Community
  • 1
  • 1
vard
  • 4,057
  • 2
  • 26
  • 46
  • Is `sep=[value]` an OpenOffice specific directive? I can't seem to find any references to it. – Mr. Llama Oct 21 '15 at 14:37
  • I don' think it's a standard as I can't find it in the CSV specification, but it's been used for Excel to force it to use `;` as separator and not tabulations (see [this](http://superuser.com/questions/647948/space-or-tabs-as-separator-in-csv) for example). But Excel doesn't recognize it too, so maybe I use it wrong. – vard Oct 21 '15 at 14:48
  • Shouldn't the `sep=;\n` string also be in UTF-16LE? – Ilmari Karonen Oct 21 '15 at 17:30

1 Answers1

1

I can't be sure if this is the cause of your problems, but an obvious issue I see is that you haven't encoded the sep=;\n string as UTF-16LE.

To fix this, change your first fputs() line to:

$bom = "\xEF\xBB\xBF";
$header = $bom . "sep=;\n" . implode($labels, ";") . "\n";
fputs($output, mb_convert_encoding($header, 'UTF-16LE', 'UTF-8'));

(The string \xEF\xBB\xBF is the Unicode Byte Order Mark in the UTF-8 encoding; it will yield the correct BOM when converted to UTF-16.)

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • 1
    Ah it was a good catch, but sadly I end up with the same result (using the UTF-8 BOM and wrap all with `mb_convert_encoding` - I updated the question code to reflect this). – vard Oct 22 '15 at 07:26