0

Most part of the world uses non-ASCII characters. But some idioms use things like é, ö, á, ã, õ etc, which can be "converted" to ascii.

Suppose the title of the post is:

Configuração é fácil!

How to represent that in a URL?

www.myblog.com/post/1200/Configura__o-_-f_cil

A much better representantion is

www.myblog.com/post/1200/Configuracao-e-facil

Wikipedia do that as in http://en.wikipedia.org/wiki/Deja_vu

Will this improve page rank in search engines?

How to do that in your favorite language?

motobói
  • 1,687
  • 18
  • 24
  • Check related articles: [http://stackoverflow.com/questions/331279/how-to-change-diacritic-characters-to-non-diacritic-ones](http://stackoverflow.com/questions/331279/how-to-change-diacritic-characters-to-non-diacritic-ones) [http://stackoverflow.com/questions/285228/how-to-convert-utf-8-to-us-ascii-in-java/285890#285890](http://stackoverflow.com/questions/285228/how-to-convert-utf-8-to-us-ascii-in-java/285890#285890) – LicenseQ Feb 18 '09 at 15:20
  • So what are you going to do about Chinese characters? Or Japanese kana? Or the German scharfes S (ß)? I think you need to think about these things before implementing this feature. – Tamas Czinege Feb 18 '09 at 14:36
  • Maybe one can just ignore a non ascii convertable character. – motobói Feb 18 '09 at 14:39
  • 'Scharfes S' (ß) is generally decomposed into two s's: Straße => Strasse – Markus Schnell Feb 18 '09 at 14:53
  • The problem with transliteration is that you might loose or change the meaning of the words. Take for example the german words *Buße* (engl. *penance*) and *Busse* (engl. *busses*) or *Maße* (engl. *measures*, *dimensions*) and *Masse* (engl. *mass*). – Gumbo Feb 18 '09 at 15:22

1 Answers1

1

In Perl

Use Text::Unidecode:

#!/usr/bin/perl -w

use utf8;
use Text::Unidecode;
print unidecode(
    "áéíóú\n"
);

# That prints: aeiou 
motobói
  • 1,687
  • 18
  • 24