0

I have two tables: posts and categories. The posts table has about 360,000 lines.

I want to show only the first post of each category, ordering by date and using pagination.

The query:

SELECT * FROM 
   (SELECT * FROM posts ORDER BY date_post DESC) as temp 
   GROUP BY id_category ORDER BY date_post DESC 
   LIMIT $offset, $limit"

The query is taking about 1 minute to load and show my site.

I've tried changing MyISAM to InnoDB and use partitions whithout success.

The server that the website is being hosted is a dedicated server, and I believe that the problem is not it.

Anyone have some suggestions?

  • If there are no too many categories, maybe it is better to try to write one query for each category? – artahian Mar 19 '13 at 17:09
  • There are no aggregating functions in your query, so GROUP BY is redundant in this context. Perhaps you meant DISTINCT? – Strawberry Mar 19 '13 at 17:09
  • As it currently stands why not just use SELECT * FROM posts as temp instead of putting a second SELECT in? – jonners99 Mar 19 '13 at 17:11
  • I've already tried DISTINCT, but I need to echo more than 1 column.. with DISTINCT I've couldn't show all the columns I'd like to – Allisson Ferreira Mar 19 '13 at 17:12
  • Are you sure that the DB is the bottleneck? Meaning, does the page load as expected with a simpler query? – mcknz Mar 19 '13 at 17:26
  • I've started with a simple query: SELECT * FROM posts GROUP BY id_category ORDER BY date_post DESC LIMIT $offset, $limit @sgeddes suggested this, but it didn't worked.. – Allisson Ferreira Mar 19 '13 at 18:01

3 Answers3

1

You can simplify your query as follows:

   SELECT * 
   FROM posts
   GROUP BY id_category
   ORDER BY date_post DESC 
   LIMIT $offset, $limit

I'm not sure what you were trying to accomplish with the subquery. Also not sure the GROUP BY is needed at all, but left it in there.

sgeddes
  • 62,311
  • 6
  • 61
  • 83
0

You need to refactor the query:

SELECT posts.* FROM
(
    SELECT id_category,MAX(date_post) date_post
    FROM posts GROUP BY id_category
) postkeys LEFT JOIN posts USING (id_category,date_post);

This should give you every product along with the most recent post for that product.

CAVEAT

I deliberately moved the LIMIT clause inside a subquery to produce just the needed range of ids. This works very, very fast !!!

I have learned this technique for a YouTube Video : http://www.youtube.com/watch?v=ZVisY-fEoMw&feature=share&list=PL0194B59719B45A96

I applied this to a post question I answered in StackOverflow : Fetching a Single Row from Join Table

Give it a Try !!!

Community
  • 1
  • 1
RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
  • Almost did it! The loading time is about 10 secs.. But I've noticed that the query is showing a few posts per day. Example: I have 10 products of different categories registered yesterday, it should be showing these 20 products first, but it's showing only 3.. 20 products of different categories regisered yesterday, showing only 6. – Allisson Ferreira Mar 19 '13 at 18:08
  • I had a typo, it said `GROUP BY id _category`. I changed it to `GROUP BY id_category`. Try again please... – RolandoMySQLDBA Mar 19 '13 at 18:10
  • I've saw that when I tried your code, I changed it up before testing :P – Allisson Ferreira Mar 19 '13 at 18:17
  • I changed the query again. I moved the order by outside the subquery. – RolandoMySQLDBA Mar 19 '13 at 18:36
  • Now the ordering look like totally random, it's showing products registered on February, March, January.. I've tried to move the LIMIT with the ORDER BY outside the subquery and also the ORDER BY inside and the LIMIT outside the subquery.. These two attemps resulted the same as your first tip. – Allisson Ferreira Mar 19 '13 at 19:13
  • Retry it again using `LEFT JOIN` instead of `INNER JOIN`. I forgot that `INNER JOINs undo an subquery ordering. – RolandoMySQLDBA Mar 19 '13 at 19:16
0

Your query is not correct because you are making use of non-aggregated columns in a GROUP BY query, and the values of those columns can be undetermined (you have no guarantees that you will get the first post).

Don't know if it's faster, but if you are sure that there are no multiple posts with the same timestamp, you could use this:

SELECT posts.*
FROM
  posts INNER JOIN (
    SELECT
      id_category, MAX(date_post) mx_date
    FROM
      posts
    GROUP BY
      id_category
  ) mx ON posts.id_category=mx.id_category
          AND posts.date_post=mx.mx_date
ORDER BY
  posts.date_post DESC
LIMIT $offset, $limit

Please see fiddle here.

Of course, make sure that you have an index on id_category and also on date_post. If you want to consider the fact than more than one post can share the same timestamp, we an id and we need to add one more join.

fthiella
  • 48,073
  • 15
  • 90
  • 106
  • It worked, but is still very slow.. See, the problem isn't the data that i'm receiving, but the time of process. Thank you anyway :D – Allisson Ferreira Mar 19 '13 at 17:57
  • @AllissonFerreira your query will usually work, but it's not documented, and if you change platform or upgrade your server it might return wrong data. The only thing you can do to improve performances here is to use indexes. – fthiella Mar 19 '13 at 18:00