3

I have to convert a url like "você-é-um-ás-da-aviação" to "voce-e-um-as-da-aviacao", to make it reading friendly on the SERP.

I could a common replacement , but I don't really like having to list each and every character, because I find it clunky and I want to keep language specific characters out of the source code as much as i can.

Is it possible? is it viable?

Jonathan DS
  • 2,050
  • 5
  • 25
  • 48
  • 4
    http://stackoverflow.com/questions/2654131/replace-diacritic-characters-with-equivalent-ascii-in-php – Muhammad Abrar Feb 10 '12 at 14:12
  • 1
    Dupblicate: http://stackoverflow.com/questions/3542717/how-to-transliterate-accented-characters-into-plain-ascii-characters. – entropid Feb 10 '12 at 14:15

4 Answers4

3
function url_safe($string){
    $url = $string;
    setlocale(LC_ALL, 'fr_FR'); // change to the one of your language
    $url = iconv("UTF-8", "ASCII//TRANSLIT", $url);  
    $url = preg_replace('~[^\\pL0-9_]+~u', '-', $url);
    $url = trim($url, "-");
    $url = strtolower($url);
    return $url;
    }
EPP
  • 58
  • 7
2

You could use the canonical decomposition mapping provided by the Unicode foundation (the files in http://www.unicode.org/Public/UNIDATA/ ).

However, this is not as simple as you seem to think it is - believe it or not, there is a "kcal" symbol whose canonical decomposition is four characters long.

You may also wish to consult the numeric equivalents tables there, as a "circled number seven" should probably map to the ASCII numeral seven, and so forth.

I strongly advise against this strategy, however - you're butchering your text for little gain, and can't recover the original input once you've transformed it.

Borealid
  • 95,191
  • 9
  • 106
  • 122
0

I suggest you map every special character and it's replacement into an array and then replace the text with a regex.
I know that you stated that you do not want to use a common replacement, but it's the only viable way to do so. You could filter them out(by checking if their ascii code is situated in a certain range) but it's not the same for the correct replacement.

gion_13
  • 41,171
  • 10
  • 96
  • 108
0

You could use a combination of iconv to get your string as ASCII then some preg_replace to remove the unwanted characters.

Something like:

$string = "você-é-um-ás-da-aviação";
$collated = iconv('UTF-8', 'ASCII//TRANSLIT', $string);
$filtred = preg_replace('`[^-a-zA-Z0-9]`', '', $collated);
echo $filtred;
Arkh
  • 8,416
  • 40
  • 45