1

I am using this code below to export my MySQL data into .CSV file. All working fine, but when I trying to export these letters ě, š, č, ř, ž, ý, á, í, é (Czech alphabet), the letters ě, ř, č are exported as ?. The other letters are exported fine.

Can you help me solve this problem please?

<?php
/*******EDIT LINES 3-8*******/
$DB_Server    = "xxx";                          //MySQL Server    
$DB_Username  = "xxx";                          //MySQL Username     
$DB_Password  = "xxx";                          //MySQL Password     
$DB_DBName    = "xxx";                          //MySQL Database Name  
$DB_TBLName   = "wp_comments";                           //MySQL Table Name
$DB_Query     = "comment_author, comment_content";       //MySQL Query (what to select from db, you can use * for all)
$filename     = "excelfilename";                         //File Name
$filename_columns = array("Autor", "Content");           //File Name of columns
/*******YOU DO NOT NEED TO EDIT ANYTHING BELOW THIS LINE*******/ 

//headers
header('Pragma: public');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Content-Description: File Transfer');
header('Content-Encoding: UTF-8');
header('Content-Type: text/csv; charset=UTF-8');
header('Content-Disposition: attachment; filename='.$filename.'.csv;');
header('Content-Transfer-Encoding: binary');  

//create MySQL connection  
mysql_connect($DB_Server,$DB_Username,$DB_Password);
mysql_select_db($DB_DBName);
$sql = "SELECT $DB_Query FROM $DB_TBLName";
$result = mysql_query($sql);

$fh = fopen('php://output', 'w');   
$fp = fwrite($fh, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) )); // Write UTF-8 BOM
if($fp)
{
    fwrite($fh, "sep=\t" . PHP_EOL);   // Hint for MS Excel
    while($row = mysql_fetch_row($result)) {
        fputcsv($fh, $row, "\t");
    }
}
fclose($fh); 
pes502
  • 1,597
  • 3
  • 17
  • 32
  • Try to use class from this question to convert all string to utf-8. http://stackoverflow.com/questions/23140358/php-glob-directory-utf8/23140525#23140525 – Alexey Palamar Apr 18 '14 at 11:37
  • @eggyal yes I am sure, here is some screenshots: [Database content](http://i.imgur.com/zlx2vLc.png) and [Exported DB content](http://i.imgur.com/zWCit3A.png) – pes502 Apr 23 '14 at 04:38

2 Answers2

7

Since you're not explicitly setting the encoding of the database connection, the default encoding with which libmysql was compiled (usually latin1) will be used. In transcoding the resultset to that character set, MySQL replaces any characters which it cannot represent with ?.

To avoid this, you should be calling mysql_set_charset('utf8') after opening the database connection—see UTF-8 all the way through.

That said, you really shouldn't be using ext/mysql at all: it is now deprecated and the manual has contained warnings against its use in new code for almost three years. Consider MySQLi or PDO instead.

Finally, if the MySQL server is on the same machine as PHP and you have the FILE privilege, why not avoid handing the data to PHP altogether and simply use MySQL's SELECT ... INTO OUTFILE command to generate the output file?

//create MySQL connection  
$DB_DSN = "mysql:host=$DB_Server;dbname=$DB_DBName;charset=utf8";
new PDO($DB_DSN, $DB_Username, $DB_Password)->exec("
  SELECT $DB_Query
  INTO OUTFILE '/tmp/$filename.tsv'
  CHARACTER SET utf8
  FROM $DB_TBLName
");

echo "\xef\xbb\xbf"       // Write UTF-8 BOM
   , "sep=\t", PHP_EOL;   // Hint for MS Excel

readfile("/tmp/$filename.tsv");

Beware that you may need to ensure that the temporary file is not in use by a concurrent process.

PS: The format is only called CSV ("comma separated values") when the field delimiter is the comma character; when using a tab character as the field delimiter, the format is more correctly called TSV ("tab separated values") and should have the .tsv or .tab extension.

Community
  • 1
  • 1
eggyal
  • 122,705
  • 18
  • 212
  • 237
  • I cannot use this solution `INTO OUTFILE` in my webhosting – pes502 Apr 23 '14 at 04:45
  • @pes502: The above answer offers more than just that one solution. The second paragraph, in particular, explains how you can fix your current attempt. – eggyal Apr 23 '14 at 07:48
0

Although it is not recommended by the PHP documentation you could try to use the "old way" to set up the connection character set via SQLs: SET NAMES utf8; and then SET CHARACTER SET utf8; just after you've selected the database.

Note: the order of these two statements matter!

Edit #1

I've just noticed that in eggyal's answer the only problem you have is the lack of INTO OUTFILE privilege. You should try to use the method described in eggyal's second paragraph, that is using the mysql_set_charset('utf8') right after selecting your database instead.

Gábor Héja
  • 458
  • 7
  • 17