0

I'm trying to save a wordlist to my database. Wordlist is a text document with utf-8 encoding. And here is my table structure;

CREATE TABLE IF NOT EXISTS `wordlist` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `kelime` char(64) COLLATE utf8_turkish_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `kelime` (`kelime`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_turkish_ci AUTO_INCREMENT=1140209 ;

Here is the php code that adds the words;

<?php
ini_set('max_execution_time', 3000);
$handle = @fopen("wordlist.txt", "r");

if ($handle) {
    include("ekle.php"); // makes db connection
    $sql = "insert into wordlist (kelime) VALUES (?)";
    $dbh->beginTransaction();
    $sth = $dbh->prepare($sql);
    while (($buffer = fgets($handle, 4096)) !== false) {
        $sth->execute(array(trim($buffer)));
    }
    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
        $dbh->rollBack();
    } else {
        $dbh->commit();
    }
    fclose($handle);
}?>

My databases' default collation is also utf-8_turkish_ci. When I add the words, my wordlist looks wrong on phpmyadmin; enter image description here

What am I doing wrong here?

yasar
  • 13,158
  • 28
  • 95
  • 160
  • Try to put everything in utf8 (database,connection to db). See this [link](http://stackoverflow.com/questions/14083847/accented-characters-in-utf-8-mysql-table-output/14083925#14083925). – Vucko Jan 05 '13 at 19:14
  • you never told php/pdo that the php<->mysql connection should be utf-8, so it's probably defaulted to iso8859-1 or something and mangled your text. The ENTIRE pipeline has to be utf8 for this sort of thing work work. browser<->php<->mysql. if any single stage anywhere is a different charset, without the appropriate charset translation routines in between, you end up with garbage. – Marc B Jan 05 '13 at 19:15

1 Answers1

2

I have already faced this problem in my past experience and i resolved with these following steps,

Step #1 : SET THE CHARSET TO UTF-8 IN THE HEAD SECTION

First of all, the browser needs to know that you are going to display or use Unicode in this page. So, go to your section and set the charset to utf-8. So, the browser will be able to show the Unicode text without any error and smoothly. You can also copy and paste the line below:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Step #2 : CREATING THE DATABASE

When you create your (a) Database and (b) any Table in the database, set the Collation of both of them to utf8_unicode_ci and you know it is very easy if you are using phpMyAdmin.

Step #3 : DATABASE INITIALIZATION

When you initialize the database connection, please add the “extra lines”

<?php

    define('HOSTNAME', 'localhost');
    define('USERNAME', 'database_user_name');
    define('PASSWORD', 'database_password');
    define('DATABASE', 'database_name');

    $dbLink = mysql_connect(HOSTNAME, USERNAME, PASSWORD);
    mysql_query("SET character_set_results=utf8", $dbLink);
    mb_language('uni'); 
    mb_internal_encoding('UTF-8');
    mysql_select_db(DATABASE, $dbLink);
    mysql_query("set names 'utf8'",$dbLink);

?>

But why are you adding the extra lines? Because you are letting the database know what kind of input you are going to work with soon.

Step #4 : INSERTING INPUTS/DATA IN THE DATABASE

<?php

    mysql_query("SET character_set_client=utf8", $dbLink);
    mysql_query("SET character_set_connection=utf8", $dbLink);

    $sql_query = "INSERT INTO
    TABLE_NAME(field_name_one, field_name_two)
    VALUES('field_value_one', 'field_value_two')";
    mysql_query($sql_query, $dbLink);

?>

Why are you adding the first two lines for? Because the database should know what kind of data is going to be stored.

Step #5 : UPDATING INPUTS/DATA IN THE DATABASE

<?php

    mysql_query("SET character_set_client=utf8", $dbLink);
    mysql_query("SET character_set_connection=utf8", $dbLink);

    $sql_query = "UPDATE TABLE_NAME
    SET field_name_one='field_value_one', field_name_two='field_value_two'
    WHERE id='$id'; ";
    mysql_query($sql_query, $dbLink);

?>

So, you are adding the extra two lines before you run your query string as you are playing with Unicode.

Step #6: SEARCHING DATA FROM THE DATABASE

<?php

    mysql_query("SET character_set_results=utf8", $dbLink);

    $sql_query = "SELECT * FROM TABLE_NAME WHERE id='$id'; ";
    $dbResult = mysql_query( $sql_query, $dbLink);

?>

Adding the one extra line every time you search your Unicode data is enough.

That's it you almost done.

I think this may help you to resolve your problem.

John Peter
  • 2,870
  • 3
  • 27
  • 46