0

I am facing a strange issue when extracting data from a MySql database and inserting it in a CSV file. In the database, the field value is the following:

K Secure Connection 1 año 1 PC

When I echo it before writing it to the CSV file, I get the same as the above in my terminal.

I use the following code to write content to the CSV file:

fwrite($this->fileHandle, utf8_encode($lineContent . PHP_EOL));

Yet, when I open the CSV with LibreOffice Calc (and specify UTF-8 as the encoding format), the following is displayed:

K Secure Connection 1 año 1 PC

I have no idea why this happens. Can someone explain how to solve this?

REM:

SELECT @@character_set_database;

returns

latin1 

REM 2:

`var_dump($lineContent, bin2hex($lineContent))`

gives

string(39) "Kaspersky Secure Connection 1 año 1 PC"
string(78) "4b6173706572736b792053656375726520436f6e6e656374696f6e20312061c3b16f2031205043"
Jérôme Verstrynge
  • 57,710
  • 92
  • 283
  • 453
  • Can you include a dump of that CSV line after it's been written to file? It'll help determine if it's LibreOffice misinterpreting or PHP for writing it incorrectly. – Mr. Llama Jun 23 '16 at 14:35
  • I have opened the CSV with gedit (a text editor) and the bad characters are there (i.e., before opening LibreOffice) – Jérôme Verstrynge Jun 23 '16 at 14:39
  • is your database utf-8, the connect to it in utf8 etc ? –  Jun 23 '16 at 14:47
  • The character set seems to be latin1 – Jérôme Verstrynge Jun 23 '16 at 14:54
  • 2
    `var_dump($lineContent, bin2hex($lineContent))` – what does that give you? It's very unclear what encoding the string is actually in; the easiest way to figure that out is to look at the actual bytes. – Secondarily, why are you `utf8_encode`ing? Are you explicitly wanting to convert from Latin1 to UTF-8? – deceze Jun 23 '16 at 15:02
  • @deceze: I have added the `vardump` result in my question. The target CSV must be in utf-8. – Jérôme Verstrynge Jun 23 '16 at 15:28

2 Answers2

3

The var_dump shows that the string is already encoded in UTF-8. Using utf8_encode on it will garble it (the function attempts a conversion from Latin-1 to UTF-8). You're therefore actually writing "año" encoded in UTF-8 into your file, which is then "correctly" picked up by LibreOffice.

Simply don't utf8_encode.

deceze
  • 510,633
  • 85
  • 743
  • 889
0

I would try to open the csv file with other editor just to make sure te problem is not with the office...

You may be double encoding the content if it is already in UTF-8 format.

I also prefer to aways work with UTF-8, so I get the data from database already in UTF-8 and no more convertion is needed. For that I run this query right after opening the SQL connection:

"set names 'utf8'"
Sergio Bernardo
  • 368
  • 1
  • 8
  • And by the way... check this related post... this may be your problem too: [http://stackoverflow.com/questions/6336586/fwrite-and-utf8](http://stackoverflow.com/questions/6336586/fwrite-and-utf8) – Sergio Bernardo Jun 23 '16 at 14:40