3

The following function rewrites urls from news and product titles that contain all sorts of characters. The string I wish to create consists of only alphanumeric values and "-", but no ending "-" or whitespace and no repeated "-". The below function works fine, but I wondered if there is any way to write it simpler or more efficient?

function urlName($string) {
    $string = trim($string);                          // no open ends
    $string = strtolower($string);                    // all lowercase
    $string = strtr($string, 'äöåÄÖÅ', 'aoaaoa');     // substitute umlauts
    $string = preg_replace('/[\W]+/', '-', $string);  // substitute non-word characters with -
    $string = preg_replace('/^-*|-*$/', '', $string); // no beinging or ending -
    return $string;
}
kontur
  • 4,934
  • 2
  • 36
  • 62
  • 2
    This has been asked here before with code included. looking for it now. – John Conde Jun 12 '12 at 13:09
  • possible duplicate of [URL Friendly Username in PHP?](http://stackoverflow.com/questions/2103797/url-friendly-username-in-php) – John Conde Jun 12 '12 at 13:11
  • 1
    Why only umlauts? What about other special characters like `áéüíú`? – kapa Jun 12 '12 at 13:12
  • -bažmegakapa domains are without, so most user expect paths to behave alike. -John Conde thanks for the link - I think the problem is somewhat similar, but not quite the same. He is not concerned with names containing non word characters (like % - & and so forth) -Mike B ty, will try add the tag – kontur Jun 12 '12 at 13:25
  • 1
    -bažmegakapa I actually misread your comment. You are quite right about other characters as well. I should mention that the server is running php 4.3, so something like yent's conversion table might be neccesairy :( – kontur Jun 12 '12 at 13:29

2 Answers2

1

I often use this :

function simpleText($s) {
    $rpl = array(
        "À" => 'A', "Á" => 'A', "Â" => 'A', "Ã" => 'A', "Ä" => 'A', "Å" => 'A',
        "à" => 'a', "á" => 'a', "â" => 'a', "ã" => 'a', "ä" => 'a', "å" => 'a',
        "Ò" => 'O', "Ó" => 'O', "Ô" => 'O', "Õ" => 'O', "Ö" => 'O', "Ø" => 'O',
        "ò" => 'o', "ó" => 'o', "ô" => 'o', "õ" => 'o', "ö" => 'o', "ø" => 'o',
        "È" => 'E', "É" => 'E', "Ê" => 'E', "Ë" => 'E',
        "è" => 'e', "é" => 'e', "ê" => 'e', "ë" => 'e',
        "Ç" => 'C',
        "ç" => 'c',
        "Ì" => 'I', "Í" => 'I', "Î" => 'I', "Ï" => 'I',
        "ì" => 'i', "í" => 'i', "î" => 'i', "ï" => 'i',
        "Ù" => 'U', "Ú" => 'U', "Û" => 'U', "Ü" => 'U',
        "ù" => 'u', "ú" => 'u', "û" => 'u', "ü" => 'u',
        "Ÿ" => 'Y',
        "ÿ" => 'y',
        "Ñ" => 'N',
        "ñ" => 'n'
    );

    $s = preg_replace('`\s+`', '_', strtr($s, $rpl));
    $s = strtolower(preg_replace('`_+`', '_', preg_replace('`[^-_A-Za-z0-9]`', '', $s)));
    return trim($s, '_');
}
yent
  • 1,303
  • 1
  • 8
  • 10
1

I think your code can be compacted to this:

function urlName($string) {
    $patterns = array('/^[\s-]+|[\s-]+$/', '/[\W]+/');
    $replacements = array('', '-');

    $string = strtr(strtolower($string), 'äöåÄÖÅ', 'aoaaoa');
    // or you can use:
    // $string = strtr(strtolower($string), $someTrMapping);

    return preg_replace($patterns, $replacements, $string);
}
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thanks for you comment. I noticed that the trim wasn't needed with the right regexp filtering empties out anyway. I'll accept your answer, because I did not know you can feed preg_replace match and replacement arrays, not just single values. – kontur Jun 13 '12 at 06:33