3

The question in a nutshell: For each row in a forums table, how can I efficiently get the number of rows of the forum_threads table and the forum_posts table that have the respective forum_id value?

I have a page showing a list of forums. For each forum, I would like to show the number of threads and the number of posts contained in that forum, and the time of the most recent post. I store forum data in three tables:

  • forums which contains the forum ID, forum name, etc. Indeces are on forum_id (primary) and ordering
  • forum_threads which contains the thread ID, forum ID, thread title, etc. Indeces are on thread_id (primary), forum_id
  • forum_posts which contains the post ID, thread ID, forum ID, message content, timestamp, etc. Indeces are on post_id (primary), thread_id and user_id.

My (inefficient) code for generating the information I require goes as follows:

- select all rows in the `forums` table
- foreach row:
    - perform a count(thread_id) query in the `forum_threads` table matching this `forum_id`
    - perform a count(post_id) query in the `forum_posts` table matching this `forum_id` 

This is starting to take far too long to compute, and so I am trying to optimize it. I figured let's start with just the num_threads and num_posts bit within a single query:

SELECT
    `forums`.`forum_id`,
    `name`,
    `description`,
    count(forum_threads.thread_id) as num_threads,
    count(forum_posts.post_id) as num_posts
FROM
    (`forums`)
JOIN
    `forum_threads` ON `forums`.`forum_id`=`forum_threads`.`forum_id`
JOIN
    `forum_posts` ON `forums`.`forum_id`=`forum_threads`.`forum_id`
WHERE `hidden` = 0
ORDER BY `ordering`

However, this query fails because there are simply too many posts for the second join to handle. It spins at 100% cpu for a bit, and then fails with error 126 (which I gather typically results from overloading resources). In any case, it succeeds if I comment out that line, and fails when I include it.

What is the efficient way to get the number of rows of two separate tables that share an ID with the row in a "parent" table, without performing multiple separate queries for each result in a previous query?

Mala
  • 14,178
  • 25
  • 88
  • 119
  • 2
    Post them as "normal" queries this wordpress? format is annoying. – Mihai Mar 06 '14 at 22:43
  • These are CodeIgniter Active Records – Mala Mar 06 '14 at 22:44
  • 1
    Also post your indices. – Mihai Mar 06 '14 at 22:45
  • Indeces and raw SQL query posted. – Mala Mar 06 '14 at 22:52
  • Have you tried `INNER JOIN` instead of just `JOIN`? – stealthyninja Mar 06 '14 at 22:57
  • `INNER JOIN` fails in a similar way. `LEFT JOIN` works, but takes far longer than handling all of these in separate queries and results in a single row returned. I suspect I also need a `GROUP BY`, but adding that in makes it loop forever with a `LEFT JOIN`. – Mala Mar 06 '14 at 23:02
  • Post the explain for that query add LIMIT to make it work. – Mihai Mar 06 '14 at 23:02
  • (I'm less interested in "please debug my broken code" and more interested in a general solution to the question at the very top) – Mala Mar 06 '14 at 23:02
  • @Mala the above inner join above for me works pretty fine with tables having ~1 million records. It could be something with the table structure/keys that is causing this. Like Mihai said an explain would definitely help see whats causing the query to be slow. – Aziz Saleh Mar 06 '14 at 23:18
  • Thanks. Table structure is here: http://pastebin.com/UPS1K4tB – Mala Mar 06 '14 at 23:19
  • Hm, adding an index on the `forum_id` field in the `forum_posts` table sped things up *considerably*. It's actually at a reasonably good speed now. – Mala Mar 06 '14 at 23:24

1 Answers1

2

Use sub queries:

SELECT
forums.forum_id, forums.name, forums.description,
(SELECT count(forum_threads.thread_id) FROM forum_threads WHERE forum_threads.forum_id = forums.forum_id)  as num_threads,
(SELECT count(forum_posts.post_id) FROM forum_posts WHERE forum_posts.forum_id = forums.forum_id)  as num_posts
FROM forums
WHERE hidden = 0
ORDER BY ordering
Aziz Saleh
  • 2,687
  • 1
  • 17
  • 27
  • 3
    Can you elaborate why this should be faster than a `JOIN`? – kero Mar 06 '14 at 23:00
  • I have not used subqueries before and as such am not terribly familiar with the syntax, but the above code yields a syntax error. – Mala Mar 06 '14 at 23:04
  • 1
    The syntax error is in the second subquery. replace `WHERE forum_posts` with `FROM forum_posts WHERE`. – Dan Graller Mar 06 '14 at 23:07
  • Ah thanks! This query works, but seems to take approximately 150% as long as doing a query on the forums table, then looping through the results in php performing the two separate queries on each row. – Mala Mar 06 '14 at 23:09
  • 1
    @kingkero not sure about the speed. I just provided an alternative, I would suggest to Mala to test the speed. If someone has a JOIN alternative, that also should be tested. – Aziz Saleh Mar 06 '14 at 23:11
  • @Mala sorry about the syntax error, thanks Mojooo for correcting that. – Aziz Saleh Mar 06 '14 at 23:12
  • It seems odd to me that this query should be slower than breaking it into separate queries each called separately from within PHP, and yet that's what I'm finding o_O – Mala Mar 06 '14 at 23:14
  • 1
    Would it be possible to post the table structures? I think that would greatly help detect whats causing all of these queries to be slow (maybe something with keys?) – Aziz Saleh Mar 06 '14 at 23:16
  • @Azis Saleh: Sure, here you go: http://pastebin.com/UPS1K4tB . I hadn't posted it originally because I was thinking about this more as a general problem, not specific to my particular table structure which I was only including as illustration. Then again, maybe there's something wrong with my structure! – Mala Mar 06 '14 at 23:21
  • Adding an index to the `forum_id` column on `forum_posts` table sped this up considerably, and made it faster than looping through via PHP, so at this point I think I can mark this as accepted. – Mala Mar 06 '14 at 23:25
  • 1
    Yep, it is always best to have foreign keys as indices on the table to speed lookup. There is a discussion on if actually having them as foreign keys increase performance, but that is a different story found here: http://stackoverflow.com/a/507218/1935500. You should also test the INNER JOIN and see if that faster than this query now that you have your indices setup. – Aziz Saleh Mar 06 '14 at 23:28