3

I want to give the user the ability to import a csv file into my php/mysql system, but ran into some problems with encoding when the language is russian which excel only can store in UTF-16 tab-coded tab files.

Right now my database is in latin1, but I will change that to utf-8 as described in question "a-script-to-change-all-tables-and-fields-to-the-utf-8-bin-collation-in-mysql"

But how should I import the file? and store the strings?

Should I for example translate it to html_entitites?

I am using the fgetcsv command to get the data out of the csv file. My code looks something like this right now.


file_put_contents($tmpfile, str_replace("\t", ";", file_get_contents($tmpfile)));
$filehandle = fopen($tmpfile,'r');
while (($data = fgetcsv($filehandle, 1000, ";")) !== FALSE) {
  $values[] = array(
    'id' => $data[0], 
    'type' => $data[1], 
    'text' => $data[4], 
    'desc' => $data[5], 
    'pdf' => $data[7]);
}

As note, if I store the xls file as csv in excel, i special chars are replaced by '_', so the only way I can get the russian chars out of the file, is to store the file in excel as tabbed seperated file in UTF16 format.

Jesper Grann Laursen
  • 2,357
  • 1
  • 20
  • 21

5 Answers5

4

Okay, the solution was to export the file from excel to UTF16 unicode text and add the ';' instaid of '\t' and convert from utf16 to utf8.

file_put_contents($tmpfile, str_replace("\t", ";",  iconv('UTF-16', 'UTF-8', file_get_contents($tmpfile))));

The table in mysql has to be changed from latin1 to utf8

ALTER TABLE  `translation` 
CHANGE  `text`  `text` VARCHAR( 100 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ,
CHANGE  `desc`  `desc` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL

And then the file could be imported as before.

When I want to export the data from the database to a excel file, the csv-version is not an option. It has to be done in excel's html mode. Where data is corrected by eg. urlencode() or htmlentities()

Here some example code.


<?php
header('Content-type: application/vnd.ms-excel');
header('Content-Disposition: attachment; filename="export.xls"');
print ('<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<div id="Classeur1_16681" align=center x:publishsource="Excel">
<table x:str border=0 cellpadding=0 cellspacing=0 width=100% style="border-collapse: collapse">');
for($i = 0 ; $i < count($lines) ; $i++) {
    print ('<tr><td>');
  print implode("</td><td>",$lines[$i]);
    print ('</td></tr>');
}
?>
</div>
</body>
</html>
Jesper Grann Laursen
  • 2,357
  • 1
  • 20
  • 21
0

I tried lots of alternative but the most easiest and rapid solution is to use Navicat

http://www.navicat.com/

enter image description here

Moxet Jan
  • 120
  • 8
0

Okay, my solution was ALSO to export the file from excel to UTF16 unicode text. The only difference was that I grab my file using a tab delimiter:

fgetcsv($fp, '999999', "\t", '"')
zmonteca
  • 2,304
  • 1
  • 26
  • 26
0

Alternatively you could make use of the MySQL load command. This command lets you specify delimiters, character set, etc. The one caveat is that the server loading the data must have direct visibility of the file, meaning that the file must reside on a filesystem visible and readable by the db server.

toluju
  • 4,097
  • 2
  • 23
  • 27
  • The load command does not support utf16 From the documentation: "Note that it is currently not possible to load data files that use the ucs2, utf16, or utf32 character set." – Jesper Grann Laursen May 22 '09 at 09:25
  • I missed that part of the documentation, sorry. :( Sounds like MySQL generally has trouble with utf16, so you may need to convert from utf16 to utf8 in your code. As per a question that has already been asked on SO (http://stackoverflow.com/questions/155514/how-to-convert-a-utf-8-string-to-a-utf-16-string-in-php), the mb_convert_encoding function can help you with this (http://www.php.net/manual/en/function.mb-convert-encoding.php) – toluju May 22 '09 at 19:53
  • may i ask you to have a look at this question on a related topic http://stackoverflow.com/questions/11116963/bangla-language-not-displayed-in-the-unicoded-csv-file – Istiaque Ahmed Jun 20 '12 at 10:10
0

I would not import it using PHP. Instead consider creating a temporary table to store your data using READ DATA INFILE.

$file_handle = fopen($file_name, 'r');
$first_row = fgetcsv($file_handle, 0, ',', '"');
fclose($file_handle);
# Your usual error checking
if (!is_array($first_row)) {
    ...
}
$columns = 'column'.implode(' TEXT, column', array_keys($first_row)).' TEXT';
query("CREATE TABLE $table ($columns) Engine=MyISAM DEFAULT CHARSET=ucs2");
query("LOAD DATA LOCAL INFILE '$file_name' INTO TABLE $table ...

Then you can do whatever you want with the data in that table.

soulmerge
  • 73,842
  • 19
  • 118
  • 155
  • About the note of the charset not being supported: I would actually try it out, I think that phrase just means that no conversion can be done while loading the data. It should be a simple copy operation dumping a bunch of bytes into columns, which should work – soulmerge May 22 '09 at 09:52
  • may i ask you to have a look at this question on a related topic http://stackoverflow.com/questions/11116963/bangla-language-not-displayed-in-the-unicoded-csv-file ? – Istiaque Ahmed Jun 20 '12 at 10:07