38

I try to read a CSV and echo the content. But the content displays the characters wrong.

Mäx Müstermänn -> Mäx Müstermänn

Encoding of the CSV file is UTF-8 without BOM (checked with Notepad++).

This is the content of the CSV file:

"Mäx";"Müstermänn"

My PHP script

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
</body>
</html>

I tried to use setlocale(LC_ALL, 'de_DE.utf8'); as suggested here without success. The content is still wrong displayed.

What I'm missing?

Edit:

An echo mb_detect_encoding($data[$c],'UTF-8'); gives me UTF-8 UTF-8.

echo file_get_contents("specialchars.csv"); gives me "Mäx";"Müstermänn".

And

print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))

gives me

Array ( [0] => Mäx [1] => Müstermänn )

What does it mean?

Community
  • 1
  • 1
testing
  • 19,681
  • 50
  • 236
  • 417
  • What happens when you do echo file_get_contents("specialchars.csv")? What happens when you do print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))? – Furgas Jan 16 '12 at 18:06

6 Answers6

75

Try this:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
robsonsanches
  • 1,267
  • 11
  • 9
  • 13
    This totally removed the special characters with space, which is totally dangerous!!! – Clain Dsilva May 27 '15 at 04:47
  • 3
    @robssanches the above code work for only alphabets type of words(character) but it does not work with other languages for e.g Chinese, Hindi, Hebrew etc.. etc – Sachin Sarola Aug 17 '18 at 12:28
  • This worked for me. So sad, that this helpful line is missing in official documentation http://de.php.net/manual/de/function.fgetcsv.php – Peter Jan 31 '19 at 17:52
  • I am having some trouble with this solution... Some characters as ’ (right single quote mark) and … (ellipsis) are not working with utf8_encode – Loenix Jan 31 '21 at 22:11
18

Encountered similar problem: parsing CSV file with special characters like é, è, ö etc ...

The following worked fine for me:

To represent the characters correctly on the html page, the header was needed :

header('Content-Type: text/html; charset=UTF-8');

In order to parse every character correctly, I used:

utf8_encode(fgets($file));

Dont forget to use in all following string operations the 'Multibyte String Functions', like:

mb_strtolower($value, 'UTF-8');
user2992220
  • 1,092
  • 1
  • 12
  • 20
9

In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...

So this solution helped me a lot:

/**
 * getting CSV array with UTF-8 encoding
 *
 * @param   resource    &$handle
 * @param   integer     $length
 * @param   string      $separator
 *
 * @return  array|false
 */
private function fgetcsvUTF8(&$handle, $length, $separator = ';')
{
    if (($buffer = fgets($handle, $length)) !== false)
    {
        $buffer = $this->autoUTF($buffer);
        return str_getcsv($buffer, $separator);
    }
    return false;
}

/**
 * automatic convertion windows-1250 and iso-8859-2 info utf-8 string
 *
 * @param   string  $s
 *
 * @return  string
 */
private function autoUTF($s)
{
    // detect UTF-8
    if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
        return $s;

    // detect WINDOWS-1250
    if (preg_match('#[\x7F-\x9F\xBC]#', $s))
        return iconv('WINDOWS-1250', 'UTF-8', $s);

    // assume ISO-8859-2
    return iconv('ISO-8859-2', 'UTF-8', $s);
}

Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:

some;nice;value;"and;here;comes;combinated;value";and;some;others

explode will explode string into parts:

some
nice
value
"and
here
comes
combinated
value"
and
some
others

but str_getcsv will explode string into parts:

some
nice
value
and;here;comes;combinated;value
and
some
others
Petr Hladík
  • 527
  • 6
  • 8
  • Great answer ! This is the only one that actually deals with wrong character encoding issue when manipulating CSV data with PHP. Either you properly encode your data before manipulating it, otherwise you do it on the fly upon reading. In my case, `fgetcsv` was returning a broken output (nothing - even NULL nor FALSE - was returned !) without any PHP notice, because of misencoding issue.. you just saved me precious time with `fgetcsvUTF8` because I had noway to re-encode original data, I hate encoding issue.. Thanks for sharing ! – EricLavault Jan 07 '20 at 10:23
  • This works really well. I have encountered one use case where it doesn't work. Not sure if you have any thoughts on it: `Åland Islands` - a row with that text in it will return `?land Islands` using your function. Aside from that though I spotted no issues – Lawrence Johnson Feb 01 '23 at 23:52
  • Thank you for answering. Please describe how you managed to solve this problem. – Petr Hladík Feb 02 '23 at 12:23
8

Try putting this into the top of your file (before any other output):

<?php

header('Content-Type: text/html; charset=UTF-8');

?>
Andreas Stokholm
  • 1,677
  • 1
  • 12
  • 17
  • 1
    If I put this on top I get �. – testing Jan 17 '12 at 12:55
  • 1
    Perhaps I should mention that I upload the csv file through a form with `enctype="multipart/form-data" accept-charset="utf-8"`. If I put your code into the example than it seems to work. – testing Jan 17 '12 at 13:34
  • @testing that made a difference for me. Had 2 CSV's I was parsing, one had the accept-charset="utf-8" and the other didn't, and it didn't display correctly until I used this. – AutoBaker Apr 23 '20 at 15:38
5

The problem is that the function returns UTF-8 (it can check using mb_detect_encoding), but do not convert, and these characters takes as UTF-8. Тherefore, it's necessary to do the reverse-convert to initial encoding (Windows-1251 or CP1251) using iconv. But since by the fgetcsv returns an array, I suggest to write a custom function: [Sorry for my english]

function customfgetcsv(&$handle, $length, $separator = ';'){
    if (($buffer = fgets($handle, $length)) !== false) {
        return explode($separator, iconv("CP1251", "UTF-8", $buffer));
    }
    return false;
}
mark
  • 21,691
  • 3
  • 49
  • 71
Manvel
  • 750
  • 7
  • 10
2

Now I got it working (after removing the header command). I think the problem was that the encoding of the php file was in ISO-8859-1. I set it to UTF-8 without BOM. I thought I already have done that, but perhaps I made an additional undo.

Furthermore, I used SET NAMES 'utf8' for the database. Now it is also correct in the database.

testing
  • 19,681
  • 50
  • 236
  • 417
  • If the imported file is of another charset than your code you may also need setlocale(). – tim May 12 '13 at 00:16