2

I created a clean function in PHP for a project to help construct useful URLs from database content. It removes any spaces and special characters, so that a sentence like "My Motörhead Albums" becomes in the URL my-motoerhead-albums. However, it seems to not correctly convert the umlauts like ö,ä,ü, etc, and I can't figure out why.

Here's the code:

function clean($text) {
$text = trim($text);
$text = strtolower($text);
$code_entities_match = array(
' ',    '--',    '"',    '!',    '@',    '#',    '$',    '%',    '^',    '&',
'*',    '(',    ')',    '_',    '+',    '{',    '}',    '|',    ':',    '"',    
'<',    '>',    '?',    '[',    ']',    '\\',    ';',    "'",    ',',    '.',
'/',    '*',    '+',    '~',    '`',    '=',    '¡',    '¿',     '´', '%C2%B4', 
'ä',    'ö',    'ü',    'ß',    'å',    'á',    'à',
'ó',    'ò',    'ú',    'ù',    'í',    'é',    'è',    'ø', 'Þ', 'ð', '%C3%9E', '&thorn;'
);
$code_entities_replace = array(
'',    '-',    '',    '',    '',    '',    '', '',    '',    '',    
'',    '',    '',    '',    '',    '',    '',    '',    '',    '',
'',    '',    '',    '',    '',    '',    '',    '',    '',    '',    
'',    '',    '',    '',    '',    '',    '',    '',    '',    '',    
'ae',    'oe',    'ue',    'ss',    'aa',    'a',    'a',    'o',    'o',    'u',    'u',    'i',    'e',    'e',    'oe',    'th',    'th',    'th',    'th'
);
$text = str_replace($code_entities_match, $code_entities_replace, $text);
return $text;

}

rayne
  • 523
  • 1
  • 7
  • 24
  • "Doesn't work" is not a proper problem description. Please tell us what exactly it does or doesn't do. – Pekka Sep 30 '10 at 10:44
  • 1
    Also, you may find better approaches to the problem here: http://stackoverflow.com/questions/465990/how-to-handle-diacritics-accents-when-rewriting-pretty-urls – Pekka Sep 30 '10 at 10:45

1 Answers1

0

This is the function I use to build url-safe strings:

static public function slugify($text)
{ 
  $text = str_replace(" ", "_", $text);

  // replace non letter or digits by -
  $text = preg_replace('~[^\\pL\d_]+~u', '-', $text);

  // trim
  $text = trim($text, '-');

  // transliterate
  $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

  // lowercase
  $text = strtolower($text);

  // remove unwanted characters
  $text = preg_replace('~[^-\w]+~', '', $text);

  if (empty($text))
  {
    return 'n-a';
  }

  return $text;
}

It was taken from symfony's Jobeet tutorial.

Maerlyn
  • 33,687
  • 18
  • 94
  • 85
  • why to have different separators for spaces and non-letters? it makes result unnecessary ugly. use dash for both and don't forget to remove duplicates. – Your Common Sense Sep 30 '10 at 10:52
  • I need different result for `aa'bb` and `aa bb`, therefore the different separators. The duplicates are a good catch, thanks. – Maerlyn Sep 30 '10 at 11:04
  • Thanks! I've tried it, but instead of converting my string to "my-motoerhead-albums", all I get is "my-mot" - it removes everything after the umlaut. Is it supposed to do that? – rayne Sep 30 '10 at 13:33
  • For me it returns "my_motorhead_albums". I'm using PHP 5.3.2. – Maerlyn Sep 30 '10 at 14:08
  • Oh, that's a good explanation! I just realized that my customer's webspace is still running some old PHP 4... – rayne Sep 30 '10 at 14:25