0

I want to export some data with pound sign to CSV file and I adding BOM for correct representation of pound. I faced with situation when I'm writing to php output stream php://output BOM was somewhere ignored so that in file there was no this 3 bytes. Interesting fact if you duplicate BOM string (something like \xef\xbb\xbf\xef\xbb\xbf) the first 3 bytes will be ignored and in file there will be only last 3 bytes. This is reproducing only for writing in output stream. When I tried to write in real file all works as expected.

EDIT: To clarify my question. Maybe someone know why it is works so? And how can I solve the problem without hacks such as duplicate BOM string

EDIT: I'm using Symfony 2.8 StreamedResponse. So the code works as

$f = fopen('php://output', 'r+');
fwrite($f, "\xef\xbb\xbf\xef\xbb\xbf"); // only 3 bytes will exists
// other code. fputcsv(...) .. fflush($f)

And files I'm checking with https://hexed.it/

Илья Савич
  • 665
  • 1
  • 6
  • 18
  • Thanks for the info – brombeer Feb 20 '18 at 10:04
  • And your question is? – Kevin Kopf Feb 20 '18 at 10:05
  • my question is why is it? and how I can solve the problem without such crutch with duplications – Илья Савич Feb 20 '18 at 10:16
  • How are you examining what bytes are on the output stream? Some concrete examples of the code you're using would be useful, including the code *outside* the PHP that's consuming the output. – IMSoP Feb 20 '18 at 10:40
  • 1
    For what it's worth, I did a quick test on the terminal of what I *think* you're describing, and it doesn't reproduce the problem, so you definitely need to include some code, preferably a "[mcve]". `php -r '$f = fopen("php://output", "wb"); fwrite($f, "\xef\xbb\xbfHello World"); fclose($f);' | xxd` displays `0000000: efbb bf48 656c 6c6f 2057 6f72 6c64 ...Hello World` as expected. – IMSoP Feb 20 '18 at 10:44
  • Your edit doesn't really help us reproduce the problem. I already showed that just using `fwrite` works exactly as expected for me. Since you're outputting to standard output, but checking files in a hex editor, you must be doing something to direct that output to a file. Are you loading the result in a web browser? What browser? How are you saving it to a file? I strongly suspect that something else is interpreting and removing the BOM, and PHP is outputting it just fine. Maybe using something like `curl` or `wget` to see the raw response from the web server would confirm this. – IMSoP Feb 20 '18 at 11:03
  • I'm sending response with headers `Content-Disposition:attachment; filename="file.csv" Content-Transfer-Encoding:binary Content-Type:application/octet-stream`. Raw response is `"Spend Month","Commission Number"...`. So you are right the problem not in php. But maybe you know why it is so ? Browser Chrome – Илья Савич Feb 20 '18 at 11:26

1 Answers1

2

Seems like this is how UTF-8 decode algorithm works in browsers, see https://stackoverflow.com/a/42717677

borN_free
  • 1,385
  • 11
  • 19