1

Something I have noticed on the StackOverflow website:

If you visit the URL of a question on StackOverflow.com:

"https://stackoverflow.com/questions/10721603"

The website adds the name of the question to the end of the URL, so it turns into:

"https://stackoverflow.com/questions/10721603/grid-background-image-using-imagebrush"

This is great, I understand that this makes the URL more meaningful and is probably good as a technique for SEO.

What I wanted to Achieve after seeing this Implementation on StackOverflow

I wish to implement the same thing with my website. I am happy using a header() 301 redirect in order to achieve this, but I am attempting to come up with a tight script that will do the trick.

My Code so Far

Please see it working by clicking here

// Set the title of the page article (This could be from the database).  Trimming any spaces either side
$original_name = trim(' How to get file creation & modification date/times in Python with-dash?');

// Replace any characters that are not A-Za-z0-9 or a dash with a space
$replace_strange_characters = preg_replace('/[^\da-z-]/i', " ", $original_name);

// Replace any spaces (or multiple spaces) with a single dash to make it URL friendly
$replace_spaces = preg_replace("/([ ]{1,})/", "-", $replace_strange_characters);

// Remove any trailing slashes
$removed_dashes = preg_replace("/^([\-]{0,})|([\-]{2,})|([\-]{0,})$/", "", $replace_spaces);

// Show the finished name on the screen
print_r($removed_dashes);

The Problem

I have created this code and it works fine by the looks of things, it makes the string URL friendly and readable to the human eye. However, it I would like to see if it is possible to simplify or "tightened it up" a bit... as I feel my code is probably over complicated.

It is not so much that I want it put onto one line, because I could do that by nesting the functions into one another, but I feel that there might be an overall simpler way of achieving it - I am looking for ideas.

In summary, the code achieves the following:

  • Removes any "strange" characters and replaces them with a space
  • Replaces any spaces with a dash to make it URL friendly
  • Returns a string without any spaces, with words separated with dashes and has no trailing spaces or dashes
  • String is readable (Doesn't contain percentage signs and + symbols like simply using urlencode()

Thanks for your help!

Potential Solutions

I found out whilst writing this that article, that I am looking for what is known as a URL 'slug' and they are indeed useful for SEO.

I found this library on Google code which appears to work well in the first instance.

There is also a notable question on this on SO which can be found here, which has other examples.

Community
  • 1
  • 1
Luke
  • 22,826
  • 31
  • 110
  • 193

2 Answers2

1

I tried to play with preg like you did. However it gets more and more complicated when you start looking at foreign languages. What I ended up doing was simply trimming the title, and using urlencode

$url_slug = urlencode($title);

Also I had to add those:

$title = str_replace('/','',$title); //Apache doesn't like this character even encoded
$title = str_replace('\\','',$title); //Apache doesn't like this character even encoded

There are also 3rd party libraries such as: http://cubiq.org/the-perfect-php-clean-url-generator

Nathan H
  • 48,033
  • 60
  • 165
  • 247
  • [Doesn't return a string that is readable](http://tehplayground.com/#e7SykpUcs) - Please read the question? – Luke Oct 30 '13 at 15:36
  • Liking the look of that 3rd party library. Have you use it before, did it cause the same problems that you are mentioning? – Luke Oct 30 '13 at 15:40
  • I did not use the 3rd party, because foreign language was a very important part for me. I chose to stick to URLEncode, even though it doesn't always look as pretty, it works fine. – Nathan H Oct 30 '13 at 16:05
0

Indeed, you can do that:

$original_name = ' How to get file creation & modification date/times in Python with-dash?';

$result = preg_replace('~[^a-z0-9]++~i', '-', $original_name);
$result = trim($result, '-');

To deal with other alphabets you can use this pattern instead:

~\P{Xan}++~u

or

~[^\pL\pN]++~u
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125