0

I'm trying to write a PHP script that loops through a CSV and inserts the values from the CSV to a MySQL table. The characters look fine in the CSV, but when I insert them into the database, they show up asç‹‚å¹² and variants.

I have the PDO connection set to a UTF-8 charset, the table itself is set to UTF-8, and when I dump the characters, they show as the correct Chinese characters, as long as I set the UTF-8 header. However, they go all wonky on insert, and remain so when selecting them back out. If I use a GUI (Navicat) to insert a particular phrase, it selects out fine.

$mysql = new PDO('mysql:host=1.2.3.4;dbname=db;charset=UTF-8', 'username', 'password');

header('Content-type: text/html; charset=utf-8');

$handle = fopen('bannedWords.csv', 'r');
$count = 0;
$word = '';
$insert = $mysql->prepare("INSERT INTO sensitivePhrases SET phrase = :word");
$insert->bindParam(':word', $word);
while (($data = fgetcsv($handle)) !== false) {
    $word = $data[1];
    var_dump($word);
    $insert->execute();
}

I've tried using uft8_encode and utf8_decode, as well as mb_convert_encoding, none of which have helped. I'd love some advice on where I'm going wrong.

Rohit
  • 3,018
  • 2
  • 29
  • 58

2 Answers2

2

In your MySQL you need to specify utf8mb4 because while UTF-8 in the rest of the universe is a 4-byte index, in MySQL it's only a 3-byte index, so only displays a small subset of all UTF-8 characters.

$mysql = new PDO('mysql:host=1.2.3.4;dbname=db;charset=utf8mb4', 'username', 'password');

You need to apply utf8mb4 to both your connection as well as your destination table.

Read the Excellent StackOveflow post about UTF-8 here

Community
  • 1
  • 1
Martin
  • 22,212
  • 11
  • 70
  • 132
0

Do not use any form of encode/decode function.

Do not read the lines one at a time; instead use LOAD DATA ... CHARACTER SET = utf8mb4 ...;

If the data is not arranged correctly for the new schema, either fiddle with @variables inside the LOAD statement or load into a tmp table, then do some transforms.

Meanwhile, you should not need to convert characters anywhere; simply specify the correct character set everywhere.

If you have more trouble, let's see the hex of some characters (together with what you expect). And let's see LOAD ... and SHOW CREATE TABLE.

Rick James
  • 135,179
  • 13
  • 127
  • 222