4

My question is, given i have the following php code to compare two strings:

   $cadena1='JUAN LÓPEZ YÁÑEZ';
   $cadena2='JUAN LOPEZ YÁÑEZ';

   if($cadena1===$cadena2){
     echo '<p style="color: green;">The strings match!</p>';
   }else{
     echo '<p style="color: red;">The strings do not match. Accent sensitive?</p>';
   }

I notice for example that if I compare LOPEZ and LÓPEZ then the comparison turns to false.

Is there a way or a function already there to compare these strings regardless of the Spanish accents?

Pathros
  • 10,042
  • 20
  • 90
  • 156
  • this might help, but is ugly. http://ie2.php.net/strtr They are in fact different characters. You might want to build a dict to point character A to character B in a replace function. – Fallenreaper Oct 22 '14 at 18:06
  • May have been answered here... [http://stackoverflow.com/questions/5782506/php-convert-foreign-characters-with-accents][1] [1]: http://stackoverflow.com/questions/5782506/php-convert-foreign-characters-with-accents – Rich701 Oct 22 '14 at 18:08
  • You might want to check out http://stackoverflow.com/questions/10477213/regex-to-ignore-accents-php and http://stackoverflow.com/questions/3371697/replacing-accented-characters-php – Jias Oct 22 '14 at 18:09

6 Answers6

5

The two strings compare to false because they are actually different sequence of bytes. To compare them, you need to normalize them in any way.

The best way to do that is to use the Transliterator class, part of the intl extension on PHP 5.4+.

A test code:

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
    $normalized = $transliterator->transliterate($e);
    echo $e. ' --> '.$normalized."\n";
}
?>

Result:

abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

(taken from my answer here: mySQL - matching latin (english) form input to utf8 (non-English) data )

This replaces characters according to the tables of the ICU library, which are extremely complete and well-tested. Before transliterating, this normalizes the string, so it matches any possible way to represent characters like ñ (ñ, for example, can be represented with 1 multibyte character or as the combination of the two characters ˜ and n).

Unlike using soundex(), which is also very resource-intense, this does not compare sounds, so it's more accurate.

Community
  • 1
  • 1
ItalyPaleAle
  • 7,185
  • 6
  • 42
  • 69
4

I would replace all accents in your strings before comparing them. You can do that using the following code:

$replacements = array('Ó'=>'O', 'Á'=>'A', 'Ñ' => 'N'); //Add the remaining Spanish accents. 
$output = strtr("JUAN LÓPEZ YÁÑEZ",$replacements);

output will now be equal to cadena2.

ltalhouarne
  • 4,586
  • 2
  • 22
  • 31
1

Why not just use collations from intl extension, Collator class?

  • with a primary level to ignore accents and case
  • with a primary level and set Collator::CASE_LEVEL attribute to On to ignore accents but not case

(and so on - see ICU or PHP documentation for details)

$cadena1 = 'JUAN LÓPEZ YÁÑEZ';
$cadena2 = 'JUAN LOPEZ YÁÑEZ';
$coll = new Collator('es_ES');
$coll->setStrength(Collator::PRIMARY);
//$coll->setAttribute(Collator::CASE_LEVEL, Collator::ON);
var_dump($coll->compare($cadena1, $cadena2)); // 0 = equals

(of course, the strings have to be UTF-8 encoded)

julp
  • 3,860
  • 1
  • 22
  • 21
  • 1
    That's the point, isn't it? 1) the question was language-specific. 2) if you want a general case, feel free to use the "root" locale (UCA) as done by grapheme\_stri\* functions since PHP 5.4.18 and 5.5.1. – julp Oct 22 '14 at 19:02
0

You could try the soundex() function, that works at least for your example:

var_dump(soundex('LOPEZ'));
// string(4) "L120"

var_dump(soundex('LÓPEZ'));
// string(4) "L120"

You would have to test that for different words and if the results are not good enough, you could try similar_text().

See an example with your code.

jeroen
  • 91,079
  • 21
  • 114
  • 132
0

Try this function from http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php. It will replace non-ASCII characters with ASCII characters in string.

$cadena1='JUAN LÓPEZ YÁÑEZ';
$cadena2='JUAN LOPEZ YÁÑEZ';

function slugify( $text ) {

    // replace non letter or digits by -
    $text = preg_replace('~[^\\pL\d]+~u', '-', $text);  
    $text = trim($text, '-');

    /**
     * //IGNORE//TRANSLIT to avoid errors on non translatable characters and still translate other characters
     * //TRANSLIT to out_charset transliteration is activated
     * //IGNORE, characters that cannot be represented in the target charset are silently discarded
    */
    $text = iconv('utf-8', 'ASCII//IGNORE//TRANSLIT', $text);
    $text = strtolower(trim($text));

    // remove unwanted characters
    $text = preg_replace('~[^-\w]+~', '', $text);

    return empty($text) ? '' : $text;
}

var_dump( slugify( $cadena1 ) );    // string(16) "juan-lopez-yanez"
var_dump( slugify( $cadena2 ) );    // string(16) "juan-lopez-yanez"
Danijel
  • 12,408
  • 5
  • 38
  • 54
  • May have unexpected behaviours with characters that are not part of the ASCII table (not even without diacritics), though. Think of Asian ones, for example. – ItalyPaleAle Oct 22 '14 at 18:40
0

I had the same issue:

the following messages are compared as no-equal.

    $var1= "Utilizar preferentemente la vacuna Td (toxoides tetánico y diftérico) o, si no está disponible, la vacuna TT (toxoide tetánico).";
    $var2 = "Utilizar preferentemente la vacuna Td (toxoides tetánico y diftérico) o, si no está disponible, la vacuna TT (toxoide tetánico).";
    if(strcmp($var1, $var2) == 0 ) {
      echo "they are Equal!";
     }else {
      echo "they are NOT Equal!";
}

the result is "they are NOT Equal!".

I tried the mentioned solution with intl but unfortunately didn't work. but the following solution helped me.

    $var1 = iconv('UTF-8','ASCII//TRANSLIT',$var1);
    $var2 = iconv('UTF-8','ASCII//TRANSLIT',$var2);
    if(strcmp($var1, $var2) == 0 ) {
      echo "they are Equal!";
     }else {
      echo "they are NOT Equal!";
}

This time they are equal!

Yuseferi
  • 7,931
  • 11
  • 67
  • 103