7

Basically, if two strings would evaluate as the same in my database I'd also like to be able to check that at the application level. For example, if somebody enters "bjork" in a search field, I want PHP to be able to match that to the string "Björk" just as MySQL would.

I'm guessing PHP has no direct equivalent to MySQL's collation options, and that the easiest thing to do would be to write a simple function that converts the strings, using strtolower() to make them uniformly lower-case and strstr() to replace multi-byte characters with their corresponding ASCII equivalents.

Is that an accurate assumption? Does anybody have a fool-proof array handy to use as the second parameter of strstr() for conforming strings as various MySQL collations would do (specifically for my current needs, utf8_general_ci)? Or, lacking that, where could I find documentation of exactly how the different collations in MySQL treat various characters? (I saw somewhere that in some collations ß is treated as S and in others as Ss, for instance, but it didn't outline every character evaluation.)

Thor
  • 659
  • 4
  • 10
  • it is possible to run a mysql query and tell mysql which collation to use for the strings passed to it, so to run the comparison on the mysql server. might not be very fast but would create the exact behavior. – hakre Dec 15 '11 at 02:40
  • I should add that efficiency is of utmost importance. – Thor Dec 15 '11 at 04:21

3 Answers3

3

Here's what I've been using, but I have yet to test it for complete consistency with MySQL.

function collation_conform($string,$collation='utf8_general_ci')
{

    if($collation === 'utf8_general_ci')
    {
        if(!is_string($string))
            return $string;

        $string = strtr($string, array(
            'Š'=>'S', 'š'=>'s', 'Ð'=>'D', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 
            'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 
            'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 
            'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 
            'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 
            'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
            'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'));

        return strtolower($string);
    }
    else die('Unsupported Collation (collation_conform() collation_helper.php)');
}
Thor
  • 659
  • 4
  • 10
0

Have you looked at the PHP collation class? http://www.php.net/manual/en/class.collator.php

Adrian Cornish
  • 23,227
  • 13
  • 61
  • 77
  • That's interesting. I didn't know that existed. Alas, I could not find out from the documentation which settings would behave the same way as MySQL. Also, I guess I'm more interested in being able to conform strings, which would give the the ability to compare a string to an array key, for example. – Thor Dec 19 '11 at 01:15
-1

Try the following code.

$s1 = 'Björk';
$s2 = 'bjork';

var_dump(
    is_same_string($s1, $s2)
);

function is_same_string($str, $str2, $locale = 'en_US')
{
    $coll = collator_create($locale);
    collator_set_strength($coll, Collator::PRIMARY);  
    return 0 === collator_compare($coll, $str, $str2);
}
masakielastic
  • 4,540
  • 1
  • 39
  • 42
  • It was "How to emulate MySQLs utf8_general_ci collation [...]" and you answered with `$locale = 'en_US'`. Are you sure these two are equal? In utf8_general_ci 'a'='ą' but 'L'!='Ł'... – Kalmar Feb 07 '18 at 14:03