0

I would like to know how could I enconde russian/hindi/chinese... characters into utf8. I know about the functions utf8_decode/utf8_encode, but they only work with ISO-8859-1.

Is there in php a more generic function that do they same task with any kind of characters?? which one should I use if I want to encode/decode russian characters??

I have also tried this: mb_convert_encoding($string, 'UTF-8', 'CP1251');

But it doesnt work, it converts: Екатеринка in Екатеринка

EDIT:

The script I'm using is a very simple form the user has to fill to store some information into a database:

<?php header('Content-Type: text/html; charset=utf-8'); //To specify to the browser the kind of content


$con = mysql_connect('**host**', '**user**', '**pass**');
mysql_select_db('encoding_test', $con);
mysql_set_charset('utf8', $con);

if($_POST['submitted']){

    //<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> this should go up


    $name2= $_POST['name'];
    echo $name2."<br>";


    /* The name I'm inserting in the form is the following one
    $name="Екатеринка";
    */

    require_once('mysqli_connect.php');

    $q="INSERT INTO USERS (name,pass) VALUES ('$name2' ,'pass')";


    $r = @mysqli_query($dbc, $q); //Here we run the query

    if($r)
    {

        echo 'Everything OK '.$q.'<br>';

    }else{

        echo 'Something wrong<br>';
        echo '<p>'.mysqli_error($dbc).'<br /><br />Query:'.$q.'</p>';

    }



}//Lo del submmited que hay que elminar al final de las pruebas

?>

<html xml:lang=en ""lang="en">
    <head>

        <title>Register From</title>
        <h1>Register From</h1>

    </head>
    <body>
        <form action="Main_menu.php" accept-charset="utf-8" method="post">

            <p>First Name: <input type="text" name="name" size="15" maxlength="20"  /></p>
            <p>Password: <input type="password" name="pass" size="15" maxlength="20"  /></p>

            <p><input type="submit" name="submit" value="Register" /></p>
            <input type="hidden" name="submitted" value="TRUE" />

        </form>
    </body>
</html>

The code to create the table in the database is:

require_once('../mysqli_connect.php');

//We create now the USERS table
$q="CREATE TABLE USERS(user_id
MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT, name VARCHAR(30) NOT
NULL, email VARCHAR(80) NOT NULL, pass VARCHAR(30) NOT
NULL, PRIMARY KEY (user_id)) ENGINE=MyISAM DEFAULT CHARSET=utf8";

$r = @mysqli_query($dbc, $q); //Here we run the query

When I execute the scrip, everything goes fine, it connects with the database and the displayed message is:

Everything OK INSERT INTO USERS (name) VALUES ('Екатеринка')

But then I go to the database and the information stored is: Екатеринка... But if I copy&paste in the SQL prompt of the data base "Everything OK INSERT INTO USERS (name) VALUES ('Екатеринка')" and press enter, the information stored is Екатеринка.

Before reading dezece's post: kunststube.net/frontback. I think the problem is not in my script, because cyrillic characters are displayed correctly, is not in the database because they are stored correctly if I use it's own SQL prompt, so the problem has to be in the connection between the browser and the database.

Should I try something else besides:

$con = mysql_connect('**host**', '**user**', '**pass**');
    mysql_select_db('encoding_test', $con);
    mysql_set_charset('utf8', $con);

**In my script I'm using the real information

EDIT 2:

Well I added a few lines more, just to check how the information is retrieved from the data base:

$q="SELECT name FROM USERS WHERE pass='pass'"; 

$r=@mysqli_query($dbc, $q);

$row=mysqli_fetch_array($r, MYSQLI_ASSOC);

echo "We get from the Database: ".$row['name']."<br>";

And the result was: "We get from the Database: Екатеринка"

So despite the information is stored wrongly in the table, at least it can be retrieved nicely

Ignacio Alorre
  • 7,307
  • 8
  • 57
  • 94
  • 1
    What encoding are these strings in originally?! "Hindi" is not an encoding a string can be in. – deceze Aug 11 '13 at 11:32
  • 1
    Read http://kunststube.net/encoding and http://kunststube.net/frontback. – deceze Aug 11 '13 at 11:33
  • I don't really know what is the original encoding, I just copy the text from somewhere else. But when I use mb_detect_encoding($name), where $name="Екатеринка" it returns UTF-8, but I'm not sure this is the correct encoding. – Ignacio Alorre Aug 11 '13 at 18:10
  • 1
    It doesn't matter where you copy it from, it matters that your app is set up to handle Unicode correctly. Read both the above articles and [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – deceze Aug 11 '13 at 18:34
  • I have been reading the two post you sent me, very useful both of them, but something is still going wrong when I try to store informarion in the database. I'm going to edit my main question with the new changes I have done in the script and the new things I have learnt thanks to your blog – Ignacio Alorre Aug 12 '13 at 14:55
  • Which version of MySQL are you using? – Joni Aug 12 '13 at 15:39
  • The host where I have all this stored use MySQL 5.1 – Ignacio Alorre Aug 12 '13 at 15:43

1 Answers1

3

The iconv and mb_convert_encoding functions convert strings from one encoding to another. For example, to convert text from ISO-8859-2 to UTF-8 you can use either:

$text = iconv("ISO-8859-2", "UTF-8", $text);
$str = mb_convert_encoding($str, "UTF-8", "ISO-8859-2");

To use them you have to know the original encoding though.

To see UTF-8 encoded text in the browser send the content-type header:

header("Content-Type: text/html; charset=UTF-8");
Joni
  • 108,737
  • 14
  • 143
  • 193
  • Ops sorry I took too much time to edit my question, but I reply you on here with the result of an experiment I did with mb_convert_encoding($string, 'UTF-8', 'CP1251'); With that function I converted Екатеринка in Екатеринка. I think Екатеринка is written using CP1251, but I think Екатеринка is not UTF-8. What am I doing wrong?? – Ignacio Alorre Aug 11 '13 at 09:43
  • 1
    It seems you are viewing the result of the encoding as if it encoded in cp1251, not in utf8. Send the content-type header to set the page encoding to UTF-8. – Joni Aug 11 '13 at 09:58
  • Thanks Joni, but I think the problem is a little bit bigger, I tried your recommendation, and some other extra things, but I still have problems. I have updated my question with the new changes I have done in the script, but it still doesn't work. – Ignacio Alorre Aug 12 '13 at 15:13