1

I have a Mysql table with column State - the states are from across Europe - and the table and columns are in utf8_unicode_ci.

When I call the database I use

mysql_select_db($database_WTF, $WTF);
mysql_query('SET NAMES utf8');
$query_Recordset1 = "SELECT * 
     FROM newmeets
     WHERE newmeets.`State` IS NOT NULL 
     AND newmeets.`State` !=  ''
     ORDER BY newmeets.`State` ASC ";

I then run it though this simple loop

mysql_select_db($database_WTF, $WTF);
mysql_query('SET NAMES utf8');
$query_Recordset1 = "SELECT * 
     FROM newmeets
     WHERE newmeets.`State` IS NOT NULL 
     AND newmeets.`State` !=  ''
     ORDER BY newmeets.`State` ASC ";

$LastState = "";
do {
    echo
    var_dump($LastState == $row_Recordset1['State']);
    echo $row_Recordset1['State'];
    $LastState = $row_Recordset1['State'];
    var_dump($LastState == $row_Recordset1['State']);
    echo "<hr>";
} while ($row_Recordset1 = mysql_fetch_assoc($Recordset1));

The first time when it hits a new value as expected I get:

boolean false
BRUSSELS
boolean true

The next time I get:

boolean true
BRUSSELS
boolean true

Much as I would expect.

The problem comes when it hits non latin characters when each pass produces:

boolean false
Baden-Württember
boolean true

When I set them to be equal they are, and then when the next record is pulled (and each record has been entered identically) it fails, this is when I have used the same encoding, I actually need it to recognise they are the same even if the characters are entered differently.

Regardless of what is happening I have tried utf8_encode and Normalizer::normalize to get a true result from the comparison so I can use this in a control structure (e.g. if you've seen it last time don't print it this time) but it fails.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Sapfizz
  • 13
  • 2
  • 1
    Can you include the output from `SHOW CREATE TABLE newmeets` in your question? It may well be that the charsets and collations on the table and columns are bogus. – Charles Dec 28 '12 at 10:59
  • [Provide hexdumps of the strings in question](http://stackoverflow.com/q/1057572/367456). – hakre Dec 28 '12 at 11:11
  • Charles - is this what you meant? -- Table structure for table `newmeets` -- CREATE TABLE IF NOT EXISTS `newmeets` ( `ID` int(11) NOT NULL auto_increment, `Country` text collate utf8_unicode_ci NOT NULL, `State` varchar(50) collate utf8_unicode_ci NOT NULL, PRIMARY KEY (`ID`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=110 ; – Sapfizz Dec 28 '12 at 15:41
  • Thanks Charles, the hex dumps were the way to view it, they showed me that although I was copying the values over they were still not the same ... back to the inputting stage. – Sapfizz Dec 28 '12 at 19:43
  • Not sure what I need to do to add kudos to your reputation? – Sapfizz Dec 28 '12 at 19:48

1 Answers1

0

It looks like you have a normalization problem in your database. Instead of storing the same state as string over and over again, put all state names into a table of it's own and reference them.

This will also ensure that you do not - e.g. by accident - put binary different but equally looking data into different rows you're not able to properly align later on as you just did.

Alternatively you should query distinct rows and update them, so that you do at least have the same binary string data for same-named states. E.g. if Mysql is able to actually align these state strings but PHP - due to it's binary nature of strings - is not.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • Charles - is this what you meant? -- Table structure for table `newmeets` -- CREATE TABLE IF NOT EXISTS `newmeets` ( `ID` int(11) NOT NULL auto_increment, `Country` text collate utf8_unicode_ci NOT NULL, `State` varchar(50) collate utf8_unicode_ci NOT NULL, PRIMARY KEY (`ID`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=110 ; – Sapfizz Dec 28 '12 at 15:20
  • Sorry ignore above I meant to write SELECT COUNT( `State` ) AS "Count", `State` GROUP BY `State` ORDER BY `State` LIMIT 0 , 30 Count State 61 Baden-Württember 4 Bavaria 2 BRUSSELS 17 – Sapfizz Dec 28 '12 at 15:31
  • As only a small percentage of the table will actually be assigned a state I was hoping to avoid having to store them in separate table and reference them. All the current states have been updated to the exact same value by doing an update on them. – Sapfizz Dec 28 '12 at 15:44