3

I realize there's a dozen similar questions, but none of the solutions suggested there work in this case.

I have a PHP variable on a page, initialized as:

$hometeam="Крылья Советов";    //Cyrrilic string

When I print it out on the page, it prints out correctly. So echo $hometeam displays the string Крылья Советов, as it should.

The content meta tag in the header is set as follows:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

And, at the very beginning of the page, I have the following (as suggested in one of the solutions found in my search):

ini_set('default_charset', 'utf-8');

So that should be all good.

The MySQL table I'm trying to save this to, and the column in question, have utf8_bin as their encoding. When I go to phpMyAdmin and manually enter Крылья Советов, it saves properly in the field.

However, when I try to save it through a query on the page, using the following basic query:

mysql_query("insert into tablename (round,hometeam) values ('1','$hometeam') ");

The mysql entry looks like this:

c390c5a1c391e282acc391e280b9c390c2bbc391c592c391c28f20c390c2a1c390c2bec390c2b2c390c2b5c391e2809ac390c2bec390c2b2

So what's going on here? If everything is ok on the page, and everything is ok with MySQL itself, where is the issue? Is there something I should add to the query itself to make it keep the string UTF-8 encoded?

Note that I have set mysql_set_charset('utf8'); after connecting to the database (at the top of the page).

EDIT: Running the query SHOW VARIABLES LIKE "%character_set%" gives the following:

Variable_name   Value
character_set_client    utf8
character_set_connection    utf8
character_set_database  latin1
character_set_filesystem    binary
character_set_results   utf8
character_set_server    latin1
character_set_system    utf8
character_sets_dir  /usr/share/mysql/charsets/

Seems like there could be something here, since there are 2 latin1's in that list. What do you think?

Also, when I type a Cyrillic string directly into phpMyAdmin, it appears fine at first (it displays correctly after I save it). But reloading the table, it displays in HEX like the inserted ones. I apologize for the misinformation regarding this in the question. As it turns out, this should mean the problem is with phpMyAdmin or the database itself.

EDIT #2: this is what show create table tablename returns:

CREATE TABLE `tablename` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `round` int(11),  `hometeam` varchar(32) COLLATE utf8_bin NOT NULL,  `competition` varchar(32) CHARACTER SET latin1 NOT NULL DEFAULT 'Russia',  PRIMARY KEY (`id`)) ENGINE=MyISAM AUTO_INCREMENT=119 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
sveti petar
  • 3,637
  • 13
  • 67
  • 144
  • did you do a query like: "SET NAMES 'utf8'" before? http://stackoverflow.com/questions/2159434/set-names-utf8-in-mysql – steven Jul 09 '13 at 16:23
  • @steven I hadn't done that, no, but now that I have the problem remains the same, so that didn't help. – sveti petar Jul 09 '13 at 16:41
  • @jovan what is the encoding of the source code file? Testing in a similar setup with Cyrillic encoding I only get a empty string. With utf8 everything goes as expected. – Salem Jul 09 '13 at 17:28
  • @Salem The encoding of the file is UTF-8 too. – sveti petar Jul 09 '13 at 19:18
  • @jovan just got an answer ready for your other question about the teams, too bad you deleted it. – Prix Jul 10 '13 at 09:52
  • @Prix Sorry about that, I found that the problem was not in the query - I had some faulty data that was causing the issue, so there was no point in troubling people further. – sveti petar Jul 10 '13 at 14:39
  • @jovan well my query removed most of your joins and simplified most of it regardless glad you figured it out. – Prix Jul 10 '13 at 21:10
  • @Prix Sounds interesting, could you perhaps put that on pastebin for me (if you still have it)? – sveti petar Jul 11 '13 at 15:32
  • the database row it is varchar or char? – Jorge Y. C. Rodriguez Jul 14 '13 at 09:52
  • aslo if you `print_r` the `$_POST` do you have the right values there? – Jorge Y. C. Rodriguez Jul 14 '13 at 09:53
  • @jycr753 It's varchar. There is no POST, it's not a form - I'm scraping the strings from an online resource. But if I type the string myself instead of scraping it, the issue is the same, so the source of the strings doesn't matter. – sveti petar Jul 14 '13 at 19:06
  • What version of PHP? When you say "*the mysql entry looks like ... `c390c5a1...`*", where do you see that? Is that what you get if you `SELECT HEX(hometeam) FROM tablename WHERE round = '1'` (and if not, what *do* you get in response to that query)? – eggyal Jul 15 '13 at 16:02

4 Answers4

2

Do you get this hex string in phpMyAdmin? I suppose when you SELECT the inserted value by e.g. PHP or the MySQL console client, you would be given the expected cyrillic UTF8 string.

If so, it's a configuration issue with phpMyAdmin, see e.g. here: http://theyouri.blogspot.ch/2010/12/phpmyadmin-collated-db-in-utf8bin-shows.html

phpMyAdmin collated db in utf8_bin shows hex data instead of UTF8 text

$cfg['DisplayBinaryAsHex'] = false;

Moreover, please don't use mysql_query that way, since you're totally open to SQL injections. I'm also not sure if you really want to use utf8_bin, see e.g. this discussion: utf8_bin vs. utf_unicode_ci or this: UTF-8: General? Bin? Unicode?

EDIT There's something weird going on. If you translate the given hex string to UTF8 characters, you get this: "ÐšÑ€Ñ‹Ð»ÑŒÑ Ð¡Ð¾Ð²ÐµÑ‚Ð¾Ð²" (see e.g. http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder). If you utf8_decode this, you get the desired "Крылья Советов". So, it seems that it's at least utf8 encoded twice (besides the problem that it somewhere shows up as hex characters).

Could you please provide the complete script? Do you utf8_encode your string anywhere? If your script is this and only this (besides a valid, opened MySQL connection):

<?php
$hometeam="Крылья Советов";    //Cyrrilic string
// open mysql connection here
mysql_set_charset('utf8');
mysql_query("INSERT INTO tablename (round, hometeam) VALUES ('1', '$hometeam')");
$result = mysql_query("SELECT * FROM tablename WHERE round = '1'");
$row = mysql_fetch_assoc($result);
echo $row['hometeam'];
?>

And you call the page, what is the result (in the page source of the browser, not what is displayed in the browser)?

Also, please check what happens if you change the collation to utf8_unicode_ci, as suggested in another answer here. That at least covers phpMyAdmin issues when displaying binary data and is propably anyway what you'll want (since you probably want ORDER BY clauses to perform as expected, see discussions in the SO questions I linked above).

EDIT2 Perhaps you could also provide some snippets like SHOW CREATE TABLE tablename or SHOW VARIABLES LIKE "%character_set%". Might help.

Community
  • 1
  • 1
stef77
  • 1,000
  • 5
  • 19
  • When I print out the values from the database (I've tried this with and without utf_encode) I don't get the proper Cyrillic text. As for sql injection, I'm aware of that, thanks. It's only an example. This is on shared hosting so I don't think I can configure phpMyAdmin that way. – sveti petar Jul 15 '13 at 13:57
  • To clarify: when you SELECT the value, you get the hex string you mentioned? Where and how do you "print out" the values from the database? – stef77 Jul 16 '13 at 20:43
  • Please have a look at my edited answer, perhaps we can find out more when you provide more information. – stef77 Jul 16 '13 at 21:14
  • Sorry about the delay, I'm on my phone now and I will do what you suggested in the morning. – sveti petar Jul 16 '13 at 21:24
  • I don't utf_encode the string anywhere in the PHP code. If I use the code snippet you suggested on the page, the correct Cyrillic string appears both on the page and in the page source. The complete page source code is here: http://pastebin.com/1RtAJ0RQ - the scraping itself is most of the script, maybe someone can try it on their own server. You're correct that there is no protection against injection, I removed it during testing to see if that, perhaps, had something to do with it. Working on your other suggestions now. – sveti petar Jul 17 '13 at 09:26
1

1) Try to save the entry to the database with the PhpMyAdmin and then also look at the result in PhpMyAdmin. Does it look OK? If yes, database is created and set up properly.

2) Try to use utf8_general_ci instead. This shouldn't matter, but give it a try.

3) Tune all necessary settings on the PHP side - follow this post: http://blog.loftdigital.com/blog/php-utf-8-cheatsheet . Especially try this trick:

echo htmlentities($hometeam, ENT_QUOTES, 'UTF-8')
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • I think the problem is almost certainly on the database side - please check my edit to the question. – sveti petar Jul 17 '13 at 09:36
  • @jovan I read your update - then the problem certainly is in your table! Please report complete output of `show create table tablename`! – Tomas Jul 17 '13 at 09:46
  • I've added the output to the question. It looks OK to me - maybe the problem is in the overall MySQL configuration? Since it's shared hosting, perhaps, I could get the hosting supprot guys to help me out? By the way, I tried switching to utf8_general_ci, didn't change anything. – sveti petar Jul 17 '13 at 09:52
  • Never mind the above comment - when I changed the collation of both the table and the column, it started working fine. As a side note, any idea *why* this happened? Shouldn't utf_bin work for everything? – sveti petar Jul 17 '13 at 10:21
  • @jovan (you probably also referer to my other answer) - this is not very clear, however I think it is `utf8_general_ci` that should work for everything. – Tomas Jul 17 '13 at 10:27
1

As I saw in the comments, you don't seam to be able to update your database configuration isn't it?

I guess you have a misconfiguration of the encoding because I saw that in the official documentation MySQL Documentation

I can propose you a PHP solution. Because of a lot of encoding problem you can transform the string before inserting it inside database. You have to find a common language to talk between PHP and the database.

The one I tried in an other project consist in transform string using url_encode($string) and url_decode($string).

David Level
  • 353
  • 2
  • 16
1

Also, when I type a Cyrillic string directly into phpMyAdmin, it appears fine at first (it displays correctly after I save it). But reloading the table, it displays in HEX like the inserted ones.

This almost certainly looks like there is a problem in your table! Run show create table tablename. I bet there is latin1 instead of utf8, because you have it set as the default in the character_set_database variable.

To change this, run the following commmand:

ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;

This will convert all your varchar fields to utf8. But be careful with the records you already have in the table, as they are already malformed, if you converted them to UTF8 they will stay malformed. Maybe the best idea is to create the database again, just add the following commands at the end of table definition:

CREATE TABLE `tablename` (
    ....
) ENGINE=<whatever you use> DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • @jovan, then try to change the utf8_bin to the utf8_general_ci. And beware that your column `competition` is still latin1. – Tomas Jul 17 '13 at 09:57
  • Yes, not that I've changed the collation of both the table and the column to utf8_general_ci, it works. Thanks! – sveti petar Jul 17 '13 at 10:18