6

I'm currently trying to find the best way (in term of usability and performance) when dealing with a situation like fetching records tagged with a specific tag, or category, or something like that.

A good way (the way I wanted to do), would be to fetch records with the tag/category slug, so the URL would look like :

http://stackoverflow.com/questions/tagged/language-agnostic

fetching records by slug, which looks better than :

http://stackoverflow.com/questions/tag/789/language-agnostic

fetching by ID and adding the slug behind so it's more search-engine friendly. This one is better performance-wise, because fetching data by an integer ID would be faster than a string. (cmiiw)

Now, with a db schema like :

posts    post_to_tags    tags
-----    ------------    ----
id       id              id
title    post_id         name
content  tag_id          slug
...                      ...

am I doing it right ? Is there pitfall or best-practices that I need to know to avoid performance problems ? (eg. tags should not exceed 10,000 records, or tag slug should not exceed n characters, or something else)

Thanks in advance.

andyk
  • 10,019
  • 10
  • 36
  • 41

2 Answers2

5

With the first URL style and your current db design, you can do this:

select ...
from   posts p
join   posts_to_tags pt on pt.post_id = p.post_id
join   tags t on t.id = pt.tag_id
where  t.slug = [url slug value];

As long as tags.slug is indexed, this should be very efficient, hardly any different from

select ...
from   posts p
join   posts_to_tags pt on pt.post_id = p.post_id
where  pt.tag_id = [url tag ID];
Tony Andrews
  • 129,880
  • 21
  • 220
  • 259
  • in the case where there's 10,000 but indexed varchar(30) tags vs the integer one, how much, approximately, is the difference we are talking about ? – andyk Jan 27 '09 at 14:15
  • 2
    Well, we're just talking about one indexed read of the tags table using the given slug value to get the tag_id, so it's peanuts. On Oracle I just experimented 10,000 selects either way and got timings of 1.078 and 1.312 seconds - i.e. 0.0234 milliseconds per query! – Tony Andrews Jan 27 '09 at 15:38
  • I mean, i.e. a *difference of* 0.0234 milliseconds per query. – Tony Andrews Jan 27 '09 at 15:43
  • wow, I really appreciate that. Thanks a lot Tony ! Nice age you have there btw. – andyk Jan 27 '09 at 18:29
  • Note too Andy, that 2 articles *may* (depending on your application) have the same slug... in this case an ID followed by the slug is better. – alex Mar 16 '09 at 11:02
  • yup, that's the extra hassle. I handle the checking and renaming stuff from the CMS, whenever new entry is inserted, following Wordpress' example, eg. some-slug-url into some-slug-url-2. Not so beautiful IMO, that's why ppl should avoid using this method when .. damn 300 characters limit.. – andyk Mar 16 '09 at 11:19
  • .. that's why people should avoid using this method when the site is heavily-packed with entries, especially when the entries are user-generated. Thanks Alex for the insight. – andyk Mar 16 '09 at 11:22
1

The first one is better, but can the slugs possibly be changed? In that case you'd need to have a redirect table (e.g. "some-article-about-dogs" is now "article-about-dogs-and-cats").

Bart van Heukelom
  • 43,244
  • 59
  • 186
  • 301