In our system, we have URLs for pages where the content, including the title, is based on user generated content. I'm trying to figure out the best design that balances SEO, human readability and resiliency.
I've been reading a bunch of material on this, including Tim Berners-Lee's document from way back: Cool URIs don't change.
As an example, imagine I have a book review site where users are submitting content (a worded review) and the book's title and author.
So if they submitted a book review for A Tale of Two Cities
(user unintentionally mispells it) with Author of Charles Dickens
. The URL could be:
http://foo.com/charles-dickens/a-tale-of-two-cities
Later on, if another book by Dickens is added, it could be:
http://foo.com/charles-dickens/oliver-twist
Then http://foo.com/charles-dickens/
could be a list of all the reviewed books on the site.
However, the problem comes into play if a change is made to book title. Imagine the user mispelled something, like A Tale of Two City
, then it's later corrected. This would also change the URL and would break any external links to that page, pagerank, etc.
What is the recommended way to handle this type of problem? Options I see:
First commit wins: No changes to URL are possible after it's initially established
Last commit wins: Always change the URL. So if there's a change to the User generated content, revise the URL. With this approach, either the old URL is dead or a trail is preserved of all the URL changes and all of them still function. Stackoverflow seems to to this.
Don't base URL on UGC: Ignore the user generated content and just come up with URLs not based on it. So url could be
http://foo.com/reviews/1234
.
What are people's thoughts on this?