1

Is it possible in PHP to compare thoses strings:

Æther == AEther == Aether

I'd like to get a positive result from this equivalence

I've actually tried multiple things but without real success:

  • Replacing the Æ and any special character to Ae with strtr (bad performance and I would rather keep the string as is)

  • Using strcmp/strcasecmp, this solve the caps problem but I still have trouble with all UTF-8 characters

What I'm trying to achieve it's to parse a list of elements retrieved from a json and match with some other json file, some can be spelled differently (utf8 or non utf8, caps etc.) and so, for now, the only way I found to do this it's to make a third json like this:

        {
        "match": {
            "name": "Unravel the \u00c6ther"
        },
        "replace": {
            "name": "Unravel the aether"
        }

And I replace the base string with the corect one, but I'd like to find a way to automatise the process.

kitensei
  • 2,510
  • 2
  • 42
  • 68
  • This is probably possible, but you'll have to write some code to do so. Please research this and tell us what you already tried. – kero May 06 '14 at 15:00
  • My bad! I've just edited the question with some details. – kitensei May 06 '14 at 15:08
  • General good practice: don't use `strtr()` or `str_replace()` to actually update the string if you don't want to modify the original string. Simply use the function when checking equivalence... `"Test"==str_replace("x","t", "Tesx")` – John Chrysostom May 06 '14 at 15:10
  • This answer from another post may be useful to you [Answer](https://stackoverflow.com/a/27680650/4607656) – Alejandro Morales May 07 '18 at 21:58

3 Answers3

12

You can use iconv's transliterate feature:

iconv('utf8', 'ASCII//TRANSLIT', 'Æther') == 'Aether';

Some Windows systems may required the use of 'utf-8' instead of 'utf8'.

Axeman
  • 32,068
  • 8
  • 81
  • 94
Marek
  • 7,337
  • 1
  • 22
  • 33
2

You need to write a function doing that. I give you two hints:

  • strtolower
  • levenshtein

This two function can get you started ;)

Paladin
  • 1,637
  • 13
  • 28
-1

Why not use str_replace() to replace Æ with AE. Then use 'strtolower()' to convert both strings to lower case and compare...

John Chrysostom
  • 3,973
  • 1
  • 34
  • 50
  • 1
    Because you 'd need to do about 5730 such replacements, give or take, to achieve good enough coverage. Most of those replacements will involve characters and languages that you have never seen. Doesn't sound like a practically usable approach to me. – Jon May 06 '14 at 15:04
  • To be fair, OP asked about getting an equivalence for one specific instance. He did not ask how to do this for any conceivable set of odd characters. – John Chrysostom May 06 '14 at 15:05
  • @JohnChrysostom It's likely OP's example was just that, an example, not the only use case – Damien Pirsy May 06 '14 at 15:06
  • Which is why more explanation and an example of what he's already tried is necessary... ;-) – John Chrysostom May 06 '14 at 15:07