0

Possible Duplicate:
Automatic clean and SEO friendly URL (slugs)

I need a function which makes "clean URL strings" like Wordpress. For example: "This is a string with frénch and gêrmän special chars + other mean stuff and I'd like to use it as an URL" Shall be transformed into this: "this-is-a-string-with-french-and-german-special-chars-other-mean-stuff-and-id-like-to-use-it-as-an-url"

Please help my laziness, it was a hard day already :-)

Community
  • 1
  • 1
Crayl
  • 1,883
  • 7
  • 27
  • 43

3 Answers3

3

There's many (many) examples available, under the title SEO friendly urls.

http://www.intrepidstudios.com/blog/2009/2/10/function-to-generate-a-url-friendly-string.aspx

function generateSlug($phrase, $maxLength)
{
    $result = strtolower($phrase);

    $result = preg_replace("/[^a-z0-9\s-]/", "", $result);
    $result = trim(preg_replace("/[\s-]+/", " ", $result));
    $result = trim(substr($result, 0, $maxLength));
    $result = preg_replace("/\s/", "-", $result);

    return $result;
}

$title = "A bunch of ()/*++\'#@$&*^!%     invalid URL characters  ";

echo(generateSlug($title));

// outputs
a-bunch-of-invalid-url-characters
JConstantine
  • 3,980
  • 1
  • 33
  • 46
  • 2
    This is a good simple approach. Its downside is that it breaks words that have diacritical characters in them. For example `mauvais Noël` turns into `mauvais-nol` which does not make sense. – Mihai Stancu Oct 08 '12 at 19:46
1

I'll help your laziness today by providing the hint to what you will need to work on tomorrow:

$final_string = str_replace(
    array(' ', 'ă', 'â', 'ä'),
    array('-', 'a', 'a', 'a'),
    $initial_string
);

There can be many variations of this, for example using RegEx (preg_replace) to match some groups of characters like multiple spaces/tabs/newlines (\s*) or multiple characters that are supposed to have the same replacement (ă|â|ä).

$final_string = preg_replace(
    array('/\s*/', '/ă|â|ä/'),
    array('-', 'a'),
    $initial_string
);
Mihai Stancu
  • 15,848
  • 2
  • 33
  • 51
0

About the closest that you'll get with a vanialla PHP function is urlencode(), but that doesn't output exactly as per the example in your question.

For example:

$my_string = strtolower(urlencode("This is a string with frénch and gêrmän special chars + other mean stuff and I'd like to use it as an URL"));
echo $my_string;

Will produce:

this+is+a+string+with+fr%e9nch+and+g%earm%e4n+special+chars+%2b+other+mean+stuff+and+i%27d+like+to+use+it+as+an+url

Unfortunately, to match WordPress' function, you'll either have to write a function based on their algorithm, or write one from scratch.

BenM
  • 52,573
  • 26
  • 113
  • 168
  • 1
    -1 IMO this does not answer the question. URL encoding is a totally different approach - WordPress cleans the URLs of any non-URL encodable characters exactly because they will get replaced by their hexcode equivalents `%27` and the like. – Mihai Stancu Oct 08 '12 at 19:43
  • Yes, but please see the last line of my answer. The question asked if there was a function to do this in PHP. My answer does indeed answer that... – BenM Oct 08 '12 at 19:45
  • 1
    To which the answer is either a clean "no" or "yes - but not without a bit of work", your answer says "yes, but you won't obtain what your example asked", which is just as good as answering "sure, use base64_encode(), not exactly what you wanted, looks kinda ugly but it's clean and all the data is there". – Mihai Stancu Oct 08 '12 at 19:49
  • If you're going to get so nit-picky about the whole situation, shall we downvote your answer too then, since it does not indicate a clean `yes` or `no` either... ;-) – BenM Oct 08 '12 at 19:51
  • In StackOverflow downvoting is not a question of 100% right or wrong. It's a matter of opinion. Some people (like me) may be (at times) nit-pickers - it's true. So you are well within your rights to downvote my answer, and having a solid argument such as the one you said makes that very much OK. The real sink or swim moment for SO answers is their usefulness with regards to the OP. Everything else is community chatter which may sometimes be unfair. But statistically it distributes evenly and makes for a good control/appreciation mechanism. – Mihai Stancu Oct 08 '12 at 20:02