Best way to design URIs when they are based on user generated content

Question

In our system, we have URLs for pages where the content, including the title, is based on user generated content. I'm trying to figure out the best design that balances SEO, human readability and resiliency.

I've been reading a bunch of material on this, including Tim Berners-Lee's document from way back: Cool URIs don't change.

As an example, imagine I have a book review site where users are submitting content (a worded review) and the book's title and author.

So if they submitted a book review for A Tale of Two Cities (user unintentionally mispells it) with Author of Charles Dickens. The URL could be:

http://foo.com/charles-dickens/a-tale-of-two-cities

Later on, if another book by Dickens is added, it could be:

http://foo.com/charles-dickens/oliver-twist

Then http://foo.com/charles-dickens/ could be a list of all the reviewed books on the site.

However, the problem comes into play if a change is made to book title. Imagine the user mispelled something, like A Tale of Two City, then it's later corrected. This would also change the URL and would break any external links to that page, pagerank, etc.

What is the recommended way to handle this type of problem? Options I see:

First commit wins: No changes to URL are possible after it's initially established
Last commit wins: Always change the URL. So if there's a change to the User generated content, revise the URL. With this approach, either the old URL is dead or a trail is preserved of all the URL changes and all of them still function. Stackoverflow seems to to this.
Don't base URL on UGC: Ignore the user generated content and just come up with URLs not based on it. So url could be http://foo.com/reviews/1234.

What are people's thoughts on this?

score 1 · Accepted Answer · edited May 23 '17 at 10:24

You're slightly wrong; Stack Overflow combines #2 and #3. A question has a specific id, and that's all you need to locate the question. For example, this question's id is 11011252. You can access the question with https://stackoverflow.com/questions/11011252, no need to add the portion of the URL (or would you call it a URI in this case?) generated from the question title. In fact, that will get automatically tacked on (whether by redirect or some other method) when you use the titleless address.

Even better, you can append whatever you want (within reason, I suppose) to the end of the address. https://stackoverflow.com/questions/11011252/this-text-will-be-ignored will take you to the question without any problem.

Stack Overflow isn't the only website that does this, either; many other sites I've seen focused on user-generated content follow the same protocol/whatever you call it. It seems like the best method to go with, as it combines the advantages of #3 (underlying URI remains the same) with the advantages of #2 (the URL contains some information about its target, which users will like), and best of all means you won't get any URI conflicts if two people generate content with the same non-unique identifiers.

Oh snap, you're right. I didn't even see the ID there. This approach is a good one. Thanks! — TMC, Jun 15 '12 at 05:08
+1: Great answer. Do you have any idea how the achieve it? I'm trying to build the same sort of thing in CI, but I can't wrap my head around the architecture. — Jezen Thomas, Dec 08 '12 at 15:00
Unfortunately I haven't had enough experience with web development to know how to actually implement this. — JAB, Dec 08 '12 at 18:04

Best way to design URIs when they are based on user generated content

1 Answers1

Linked