2

Possible Duplicate:
How to handle diacritics (accents) when rewriting 'pretty URLs'

I want to replace special characters, such as Å Ä Ö Ü é, with "normal" characters (those between a-z and 0-9). And spaces should certainly be replaced with dashes, but that's not really a problem.

In other words, I want to turn this:

en räksmörgås

into this:

en-raksmorgas

What's the best way to do this?

Thank you in advance.

Community
  • 1
  • 1
Ivar
  • 4,344
  • 6
  • 38
  • 53
  • 1
    Possibly duplicate of http://stackoverflow.com/questions/465990/how-to-handle-diacritics-accents-when-rewriting-pretty-urls – Lekensteyn Aug 27 '10 at 19:48
  • Hm, didn't see that one - I don't really find its title that describing. Thank you for notice. – Ivar Aug 27 '10 at 20:29

3 Answers3

17

You can use iconv for the string replacement...

$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);

Basically, it'll transliterate the characters it can, and drop those it can't (that are not in the ASCII character set)...

Then, just replace the spaces with str_replace:

$string = str_replace(' ', '-', $string);

Or, if you want to get fancy, you can replace all consecutive white-space characters with a single dash using a simple regex:

$string = preg_replace('/\\s+/', '-', $string);

Edit As @Robert Ros points out, you need to set the locale prior to using iconv (Depending on the defaults of your system). Just execute this line prior to the iconv line:

setlocale(LC_CTYPE, 'en_US.UTF8');
ircmaxell
  • 163,128
  • 34
  • 264
  • 314
  • 3
    +1 also see: http://stackoverflow.com/questions/1284535/php-transliteration Btw, it's important your locale is set correctly for iconv transliteraion to work properly. – Robert Ros Aug 27 '10 at 19:59
  • @Robert Ros: Thanks, I've added that to the answer... – ircmaxell Aug 27 '10 at 20:07
  • Wonderful! But instead of 'ä' I get 'a"'. Not a big problem, I just have to run a preg_replace to remove everything but the characters. But is it ment to be so? I'm just curious. – Ivar Aug 27 '10 at 20:15
  • @Robert Thanks, I always wondered why sometimes iconv transliteration work and sometimes it didn't. – Artefacto Aug 27 '10 at 20:16
  • Huh? You get a quote character? I tested it on my machine, and it worked fine (I got your expected output). Did you run `setlocale` first? – ircmaxell Aug 27 '10 at 20:18
  • Yeah, it's kinda weird. I'm running the setlocale function above the iconv line, but my output is: r"aksm"orgas – Ivar Aug 27 '10 at 20:21
  • The difference in transliteration appears to be based on PHP version: 5.2.9 transliterates both `à` and `ä` into `a`, as expected. But PHP 5.4 transliterates those into `'a` and `"a` respectively. – Joe Aug 09 '12 at 16:24
1

Check out http://php.net/manual/en/function.strtr.php

<?php
$addr = strtr($addr, "äåö", "aao");
?>
fcingolani
  • 581
  • 2
  • 5
0

A clever hack often used for this is calling htmlentitites, then running

preg_replace('/&(\w)(acute|uml|circ|tilde|ring|grave);/', '\1', $str);

to get rid of the diacritics. A more complete (but often unnecessarily complicated) solution is using a Unicode decomposition algorithm to split diacritics, then dropping everything that is not an ASCII letter or digit.

Tgr
  • 27,442
  • 12
  • 81
  • 118