19

Are there any solutions that will convert all foreign characters to A-z equivalents? I have searched extensively on Google and could not find a solution or even a list of characters and equivalents. The reason is I want to display A-z only URLs, plus plenty of other trip ups when dealing with these characters.

  • As pointed out a crude list of conversions for latin based alphabets would suffice. –  Aug 16 '09 at 15:55
  • 3
    There is now a Transliteration Class added to recent version of PHP (PHP 5.4) – bakytn Mar 04 '12 at 04:53
  • 1
    [The `Transliterator` Class](http://php.net/manual/class.transliterator.php) ([intl extension](http://php.net/manual/book.intl.php)) – hakre Apr 17 '12 at 12:10

10 Answers10

24

You can use iconv, which has a special transliteration encoding.

When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character.

-- http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

See here for a complete example that matches your use case.

troelskn
  • 115,121
  • 27
  • 131
  • 155
  • 1
    i had just stumbled across iconv as my research continued, thank you very much for linking me to the complete example. thanks. –  Aug 16 '09 at 16:15
  • 1
    You should incorporate Shane O'Grady's answer – Quamis Oct 26 '11 at 11:41
  • You should never use iconv for this work. On different system can be different iconv library, so results are not guaranteed (more precisely – it is guaranteed, that results would be different), so beware of this! – Velda Jul 17 '18 at 18:12
13

If you are using iconv then make sure your locale is set correctly before you try the transliteration, otherwise some characters will not be correctly transliterated

setlocale(LC_CTYPE, 'en_US.UTF8');
Shane O'Grady
  • 2,465
  • 18
  • 20
9

This will convert as much as possible foreign characters (including Cyrillic, Chinese, Arabic etc.) to their A-z equivalents:

$AzString = transliterator_transliterate('Any-Latin;Latin-ASCII;', $foreignString);

You might want install PHP Intl extension first.

Ilyich
  • 4,966
  • 3
  • 39
  • 27
  • 3
    Command for Debian (Ubuntu): `sudo aptitude install php5-intl`. Here you can find [complete function for nice filenames or URLs](http://www.jasom.net/php-prepare-sanitize-transliterate-convert-change-user-string-input-for-filename-or-url-address). – Jasom Dotnet Dec 12 '15 at 08:25
7

If you are stuck with an development&release environment that doesn't support PHP 5.4 or newer, you should either use iconv or a custom Transliteration library.

In case of iconv, I find it extremely unhelpful especially using it on Arabic or Cyrillic alphabets. I would go for a PHP 5.4 built-in Transliteration class or a custom Transliteration class.

One of the solutions posted mentioned a custom library which I did not test.

When I was using Drupal, I loved their transliteration module, that I've recently ported it to make it usable without Drupal.


You can download it here and use as follows:

<?php

include "JTransliteration.php";

$mombojombotext = "誓曰:『時日害喪?予及女偕亡。』民欲與之偕亡,雖有";
$nonmombojombotex = JTransliteration::transliterate($mombojombotext);

echo $nonmombojombotex;

?>
Community
  • 1
  • 1
Kemal Dağ
  • 2,743
  • 21
  • 27
  • 1
    +1 Found this very useful, but you have a notice: `Notice: Trying to get property of non-object in /path/JTransliteration.php on line 207`. I've fixed this issue to my specific needs ;) – Zuul Mar 13 '13 at 20:37
  • 1
    My fix was to delete the line 206, and replace the line 207 with `$langcode = "UTF-8";`! – Zuul Mar 13 '13 at 20:38
  • Here is another, more up to date port: https://github.com/Behat/Transliterator It can be installed using composer. – jaywilliams Nov 22 '16 at 21:02
4

Note: I'm reposting this from another similar question in the hope that it's helpful to others.

I ended up writing a PHP library based on URLify.js from the Django project, since I found iconv() to be too incomplete. You can find it here:

https://github.com/jbroadway/urlify

Handles Latin characters as well as Greek, Turkish, Russian, Ukrainian, Czech, Polish, and Latvian.

Johnny Broadway
  • 651
  • 6
  • 3
1
<?php
/**
 * @author bulforce[]gmail.com # 2011
 * Simple class to attempt transliteration of bulgarian lating text into bulgarian cyrilic text
 */

// Usage:
// $text = "yagoda i yundola";
// $tl = new Transliterate();
// echo $tl->lat_to_cyr($text); //ягода и юндола

class Transliterate {

    private $cyr_identical = array("а", "б", "в", "в", "г", "д", "е", "ж", "з", "и", "к", "л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ъ", "я");
    private $lat_identical = array("a", "b", "v", "w", "g", "d", "e", "j", "z", "i", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "h", "c", "y", "q");
    private $cyr_fricative = array("ж", "ч", "ш", "щ", "ц", "я", "ю", "я", "ю");    
    private $lat_fricative = array("zh", "ch", "sh", "sht", "ts", "ia", "iu", "ya", "yu");

    public function __construct() {
        $this->identical_to_upper();
        $this->fricative_to_variants();
    }

    public function lat_to_cyr($str) {

        for ($i = 0; $i < count($this->cyr_fricative); $i++) {
            $c_cyr = $this->cyr_fricative[$i];
            $c_lat = $this->lat_fricative[$i];
            $str = str_replace($c_lat, $c_cyr, $str);
        }

        for ($i = 0; $i < count($this->cyr_identical); $i++) {
            $c_cyr = $this->cyr_identical[$i];
            $c_lat = $this->lat_identical[$i];
            $str = str_replace($c_lat, $c_cyr, $str);
        }

        return $str;
    }

    private function identical_to_upper() {

        foreach ($this->cyr_identical as $k => $v) {
            $this->cyr_identical[] = mb_strtoupper($v, 'UTF-8');
        }

        foreach ($this->lat_identical as $k => $v) {
            $this->lat_identical[] = mb_strtoupper($v, 'UTF-8');
        }
    }

    private function fricative_to_variants() {
        foreach ($this->lat_fricative as $k => $v) {
            // This handles all chars to Upper
            $this->lat_fricative[] = mb_strtoupper($v, 'UTF-8');
            $this->cyr_fricative[] = mb_strtoupper($this->cyr_fricative[$k], 'UTF-8');

            // This handles variants
            // TODO: fix the 3 leter sounds
            for ($i = 0; $i <= count($v); $i++) {
                $v[$i] = mb_strtoupper($v[$i], 'UTF-8');
                $this->lat_fricative[] = $v;
                if ($i == 0) {
                    $this->cyr_fricative[] = mb_strtoupper($this->cyr_fricative[$k], 'UTF-8');
                } else {
                    $this->cyr_fricative[] = $this->cyr_fricative[$k];
                }
                $v[$i] = mb_strtolower($v[$i], 'UTF-8');
            }
        }
    }

}
bulforce
  • 991
  • 1
  • 6
  • 11
1

for composer adepts there is slugify

https://github.com/cocur/slugify

use Cocur\Slugify\Slugify;
$slugify = new Slugify();
echo $slugify->slugify('Hello World!'); // hello-world

//You can also change the separator used by Slugify:
echo $slugify->slugify('Hello World!', '_'); // hello_world

//The library also contains Cocur\Slugify\SlugifyInterface. Use this interface whenever you need to type hint an instance of Slugify.
//To add additional transliteration rules you can use the addRule() method.
$slugify->addRule('i', 'ey');
echo $slugify->slugify('Hi'); // hey
1

Try this one

function Unaccent( $string ) {

$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);

$normalized = $transliterator->transliterate($string);

return $normalized;

}
Alin Razvan
  • 1,451
  • 13
  • 18
0

The problem with your query is that it is a very hard thing to do. Not all glyphs in most languages have a-z equivalents, all glyphs have phonetic equivalents (but these are words not letters), if you are just dealing with Latin based languages then things are a little easier but you still have issues with things like I-mutation.

Your best solution word be to come up with a crude list of phonetic sounds -> a-z equivalents, it won't be perfect but without any more information on you exact requirements it is hard to develop a solution.

AAA
  • 4,928
  • 1
  • 24
  • 20
  • I am mosting dealing with European languages, a rough solution would be fine, I once found a big list in the source of another script, but have completely lost it. –  Aug 16 '09 at 15:36
0

Nice library found at:

1) https://github.com/ashtokalo/php-translit (many languages, however, lacks of some languages)

2) https://github.com/fre5h/transliteration (only for Russian and Ukrainian)

T.Todua
  • 53,146
  • 19
  • 236
  • 237