Is there a way to compare two strings in Spanish regardless of the accents in PHP?

Question

My question is, given i have the following php code to compare two strings:

   $cadena1='JUAN LÓPEZ YÁÑEZ';
   $cadena2='JUAN LOPEZ YÁÑEZ';

   if($cadena1===$cadena2){
     echo '<p style="color: green;">The strings match!</p>';
   }else{
     echo '<p style="color: red;">The strings do not match. Accent sensitive?</p>';
   }

I notice for example that if I compare LOPEZ and LÓPEZ then the comparison turns to false.

Is there a way or a function already there to compare these strings regardless of the Spanish accents?

this might help, but is ugly. http://ie2.php.net/strtr They are in fact different characters. You might want to build a dict to point character A to character B in a replace function. — Fallenreaper, Oct 22 '14 at 18:06
May have been answered here... [http://stackoverflow.com/questions/5782506/php-convert-foreign-characters-with-accents][1] [1]: http://stackoverflow.com/questions/5782506/php-convert-foreign-characters-with-accents — Rich701, Oct 22 '14 at 18:08
You might want to check out http://stackoverflow.com/questions/10477213/regex-to-ignore-accents-php and http://stackoverflow.com/questions/3371697/replacing-accented-characters-php — Jias, Oct 22 '14 at 18:09

score 5 · Answer 1 · edited May 23 '17 at 11:50

The two strings compare to false because they are actually different sequence of bytes. To compare them, you need to normalize them in any way.

The best way to do that is to use the Transliterator class, part of the intl extension on PHP 5.4+.

A test code:

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
    $normalized = $transliterator->transliterate($e);
    echo $e. ' --> '.$normalized."\n";
}
?>

Result:

abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

(taken from my answer here: mySQL - matching latin (english) form input to utf8 (non-English) data )

This replaces characters according to the tables of the ICU library, which are extremely complete and well-tested. Before transliterating, this normalizes the string, so it matches any possible way to represent characters like ñ (ñ, for example, can be represented with 1 multibyte character or as the combination of the two characters ˜ and n).

Unlike using soundex(), which is also very resource-intense, this does not compare sounds, so it's more accurate.

For me it's a +1, and this should be the accepted answer. – BenMorel Nov 14 '19 at 16:49 — BenMorel, Nov 14 '19 at 16:49

score 4 · Accepted Answer · answered Oct 22 '14 at 18:06

4

I would replace all accents in your strings before comparing them. You can do that using the following code:

$replacements = array('Ó'=>'O', 'Á'=>'A', 'Ñ' => 'N'); //Add the remaining Spanish accents. 
$output = strtr("JUAN LÓPEZ YÁÑEZ",$replacements);

output will now be equal to cadena2.

answered Oct 22 '14 at 18:06

ltalhouarne

4,586
2
22
31

3

So you plan to do that for all 60.000 characters of the Unicode table? – ItalyPaleAle Oct 22 '14 at 18:06
4

Nope, just for the 7 accents in the Spanish/French language. – ltalhouarne Oct 22 '14 at 18:07
find a preexisting dict if you like. You arent stripping a character, you are replacing it with another. So, you need to build a dict of characters and what their replacement values would be. – Fallenreaper Oct 22 '14 at 18:07
What about multiple representations? ñ can be represented in 2 different ways, for example, with 2 different byte sequences. – ItalyPaleAle Oct 22 '14 at 18:08

score 1 · Answer 3 · answered Oct 22 '14 at 18:24

Why not just use collations from intl extension, Collator class?

with a primary level to ignore accents and case
with a primary level and set Collator::CASE_LEVEL attribute to On to ignore accents but not case

(and so on - see ICU or PHP documentation for details)

$cadena1 = 'JUAN LÓPEZ YÁÑEZ';
$cadena2 = 'JUAN LOPEZ YÁÑEZ';
$coll = new Collator('es_ES');
$coll->setStrength(Collator::PRIMARY);
//$coll->setAttribute(Collator::CASE_LEVEL, Collator::ON);
var_dump($coll->compare($cadena1, $cadena2)); // 0 = equals

(of course, the strings have to be UTF-8 encoded)

That's the point, isn't it? 1) the question was language-specific. 2) if you want a general case, feel free to use the "root" locale (UCA) as done by grapheme\_stri\* functions since PHP 5.4.18 and 5.5.1. — julp, Oct 22 '14 at 19:02

jeroen · Answer 4 · 2014-10-22T18:17:14.780

0

You could try the soundex() function, that works at least for your example:

var_dump(soundex('LOPEZ'));
// string(4) "L120"

var_dump(soundex('LÓPEZ'));
// string(4) "L120"

You would have to test that for different words and if the results are not good enough, you could try similar_text().

See an example with your code.

edited Oct 22 '14 at 18:17

answered Oct 22 '14 at 18:06

jeroen

91,079
21
114
132

score 0 · Answer 5 · answered Oct 22 '14 at 18:34

Try this function from http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php. It will replace non-ASCII characters with ASCII characters in string.

$cadena1='JUAN LÓPEZ YÁÑEZ';
$cadena2='JUAN LOPEZ YÁÑEZ';

function slugify( $text ) {

    // replace non letter or digits by -
    $text = preg_replace('~[^\\pL\d]+~u', '-', $text);  
    $text = trim($text, '-');

    /**
     * //IGNORE//TRANSLIT to avoid errors on non translatable characters and still translate other characters
     * //TRANSLIT to out_charset transliteration is activated
     * //IGNORE, characters that cannot be represented in the target charset are silently discarded
    */
    $text = iconv('utf-8', 'ASCII//IGNORE//TRANSLIT', $text);
    $text = strtolower(trim($text));

    // remove unwanted characters
    $text = preg_replace('~[^-\w]+~', '', $text);

    return empty($text) ? '' : $text;
}

var_dump( slugify( $cadena1 ) );    // string(16) "juan-lopez-yanez"
var_dump( slugify( $cadena2 ) );    // string(16) "juan-lopez-yanez"

May have unexpected behaviours with characters that are not part of the ASCII table (not even without diacritics), though. Think of Asian ones, for example. — ItalyPaleAle, Oct 22 '14 at 18:40

Yuseferi · Answer 6 · 2022-01-21T17:08:49.523

I had the same issue:

the following messages are compared as no-equal.

    $var1= "Utilizar preferentemente la vacuna Td (toxoides tetánico y diftérico) o, si no está disponible, la vacuna TT (toxoide tetánico).";
    $var2 = "Utilizar preferentemente la vacuna Td (toxoides tetánico y diftérico) o, si no está disponible, la vacuna TT (toxoide tetánico).";
    if(strcmp($var1, $var2) == 0 ) {
      echo "they are Equal!";
     }else {
      echo "they are NOT Equal!";
}

the result is "they are NOT Equal!".

I tried the mentioned solution with intl but unfortunately didn't work. but the following solution helped me.

    $var1 = iconv('UTF-8','ASCII//TRANSLIT',$var1);
    $var2 = iconv('UTF-8','ASCII//TRANSLIT',$var2);
    if(strcmp($var1, $var2) == 0 ) {
      echo "they are Equal!";
     }else {
      echo "they are NOT Equal!";
}

This time they are equal!

Is there a way to compare two strings in Spanish regardless of the accents in PHP?

6 Answers6