3

I have a short script that reads a CSV file which looks like the following:

$csv = new SplFileObject($pathToFile, 'r');

while (!$csv->eof() && ($row = $csv->fgetcsv()) && $row[0] !== null) {
    var_dump($row);
}

This works ok, except it has a problem with some non-standard characters. There are some German-language words in the CSV, and my specific problem is that it has difficulties with umlauts. An example of the type of row it outputs is:

array(5) {
    [0]=>
        string(6) "J¦rgen"
    [1]=>
        string(8) "Lastname"
    [2]=>
        string(14) "name@domain.de"
    [3]=>
        string(7) "Example"
    [4]=>
        string(7) "Example"
}

The ü in Jürgen getting replaced with a ¦ character.

I've tried putting the following code before:

mb_internal_encoding('UTF-8');

but it has had no effect.

Opening the csv file in Vi shows the ü successfully, so the file is correct on the server.

Can anyone advise how to PHP successfully handling German characters when parsing a CSV?

Jack
  • 33
  • 1
  • 3
  • Works for me. Assuming you are executing it in terminal, which terminal encoding are you using? Try it via browser. – hek2mgl Jul 03 '14 at 13:55
  • I was running it from the command line, but I've made some tweaks to run from a browser and the same thing happens. The values from the CSV get put into a MySQL database table, which also doesn't get the umlauted characters. (Other PHP scripts in the same system, where data comes from a HTTP POST rather than a CSV file, successfully handle umlauts and insert them into MySQL without issue). – Jack Jul 03 '14 at 14:26
  • Try to convert the file to `utf-8` using `iconv` (on commandline). You'll need to find out what is the input encoding of the csv file before. For that you'll need to have a look at the program which produces the csv. If this is impossible, my best guess is windows-CP-1252 – hek2mgl Jul 03 '14 at 14:29
  • I originally was running this through PuTTY using UTF-8. – Jack Jul 03 '14 at 14:32
  • You wrote the csv manually? – hek2mgl Jul 03 '14 at 14:35
  • iconv worked thanks! The command I used was: `iconv -f iso-8859-1 -t utf-8 originalFile.csv > modifiedFile.csv` The CSV file itself was supplied from a third-party, its not my own creation. – Jack Jul 03 '14 at 14:38
  • Good to know. I'll write a short answer... – hek2mgl Jul 03 '14 at 15:00

1 Answers1

2

The code itself as shown should work. I guess the problem is caused by character encoding of the CSV file, which seems not utf-8. You need to find out what is the encoding of your input file.

Once you found that out, you can convert the file to utf-8 using the iconv command. (In comments you told that the input encoding was iso-8859-1).

Example:

iconv -f 'iso-8859-1' -t 'utf-8' input.csv > utf8.csv

Attention! please never attempt to override the file directly like this:

iconv -f 'iso-8859-1' -t 'utf-8' data.csv > data.csv

This would overwrite (truncate) data.csv and lead to complete data loss. This is because the shell creates and truncates the output file before executing the command itself.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266