54

I need to find the latest post for each author and then group the results so I only a single latest post for each author.

SELECT wp_posts.* FROM wp_posts
        WHERE wp_posts.post_status='publish'
        AND wp_posts.post_type='post'
        GROUP BY wp_posts.post_author           
        ORDER BY wp_posts.post_date DESC

This is correctly grouping the output so I only get one post per author, but it is ordering the results after they have been grouped and not before they have been selected.

edze
  • 2,965
  • 1
  • 23
  • 29
Tom
  • 33,626
  • 31
  • 85
  • 109
  • 4
    The quickest way to do this is with one inner query [`code`] (SELECT wp_posts.* FROM (SELECT * FROM wp_posts ORDER BY wp_post.post_date DESC) as wp_posts WHERE wp_posts.post_status='publish' AND wp_posts.post_type='post' GROUP BY wp_posts.post_author;) Maybe it isn't most effecient but it works. – Milos Aug 19 '11 at 12:35
  • 1
    I don't think that the accepted answer for this question is correct and have continued the question over here http://stackoverflow.com/questions/14770671/mysql-order-by-before-group-by – Rob Forrest Feb 08 '13 at 11:05
  • @RobForrest Hey, I'm happy to update the selected answer :) I asked this a few years ago so if there's a better way I'm happy to point people towards it – Tom Feb 08 '13 at 11:42

10 Answers10

23

select wp_posts.* from wp_posts
where wp_posts.post_status='publish'and wp_posts.post_type='post'
group by wp_posts.post_author
having wp_posts.post_date = MAX(wp_posts.post_date) /* ONLY THE LAST POST FOR EACH AUTHOR */
order by wp_posts.post_date desc


EDIT:

After some comments I have decided to add some additional informations.

The company I am working at also uses Postgres and especially SQL Server. This databases don't allow such queries. So I know that there is a other way to do this (I write a solution below). You shoud also have to know what you do if you don't group by all columns treated in the projection or use aggregate functions. Otherwise let it be!

I chose the solution above, because it's a specific question. Tom want to get the recent post for each author in a wordpress site. In my mind it is negligible for the analysis if a author do more than one post per second. Wordpress should even forbid it by its spam-double-post detection. I know from personal experience that there is a really significant benefit in performance doing a such dirty group by with MySQL. But if you know what you do, then you can do it! I have such dirty groups in apps where I'm professionally accountable for. Here I have tables with some mio rows which need 5-15s instead of 100++ seconds.

May be useful about some pros and cons: http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/debunking-group-by-myths.html


SELECT
    wp_posts.*
FROM 
    wp_posts
    JOIN 
    (
        SELECT
            g.post_author
            MAX(g.post_date) AS post_date
        FROM wp_posts as g
        WHERE
            g.post_status='publish'
            AND g.post_type='post'
        GROUP BY g.post_author
    ) as t 
    ON wp_posts.post_author = t.post_author AND wp_posts.post_date = t.post_date

ORDER BY wp_posts.post_date

But if here is more then one post per second for a author you will get more then one row and not the only last one.

Now you can spin the wheel again and get the post with the highest Id. Even here it is at least not guaranteed that you really get the last one.

Pacerier
  • 86,231
  • 106
  • 366
  • 634
edze
  • 2,965
  • 1
  • 23
  • 29
  • 1
    Is it just me, or using MAX() in a HAVING clause doesn't work? I tried it and got a 'Invalid use of aggregate function' error. What am I doing wrong? – Sophivorus Mar 15 '12 at 05:38
  • 7
    It's not just you; I don't think this answer makes sense. I don't get an error, but it gives me only one row, the highest in the entire set, rather than for each group - which does make sense, since HAVING, like ORDER, is processed after the rowset is already built. – LinusR Apr 23 '12 at 18:10
  • 1
    MySQL returns just one row. Other DBMS implements the SQL standard stricter. If want to get a real strict aggregate, you have to group all columns treated in the projection or use aggregate functions. – edze Apr 23 '12 at 20:30
  • MySQL returned an empty result set!!! `SELECT * FROM content JOIN category USING (category_id) WHERE user_id = '1' GROUP BY category_id HAVING content_create_time = MAX(content_create_time) ORDER BY content_create_time DESC` – EmRa228 Jul 03 '12 at 14:56
  • 15
    This should NOT be the accepted answer as it does not work correctly. – Noah Goodrich Jul 10 '12 at 21:13
  • I don't think that this works either. At best it would produce unstable results. @Tom, I'd suggest you remove this as the accepted answer. For what it's worth, I think the only way this can be done is by using a subquery. – Rob Forrest Aug 08 '12 at 14:44
  • It works because it is negligible if the author do more than one post per second. wp_posts.post_date is a DATETIME. Take a look here: http://codex.wordpress.org/images/9/9e/WP3.0-ERD.png – edze Aug 13 '12 at 19:53
  • 1
    So, a question for all of you who claim this doesn't work - is your problem/question the EXACT same as @Tom's? If not, you might need to adjust your query to function in the way you need it. You shouldn't look at this as the be-all-end-all solution, but as a starting point for your needs. Thank you, @edze, this really helped me. – Thumper Dec 21 '12 at 21:46
  • I don't think that the accepted answer for this question is correct and have continued the question over here http://stackoverflow.com/questions/14770671/mysql-order-by-before-group-by – Rob Forrest Feb 08 '13 at 11:06
  • The max(DATE) subquery will return the max date but not return the respective data for that max date.... the post ORDER BY fixes this after but I dont like this method as its not explicit.. `SELECT g.post_author g.post_date FROM wp_posts as g WHERE g.post_status='publish' AND g.post_type='post' AND g.post_date = (SELECT MAX(post_date) FROM wp_posts) GROUP BY g.post_author` – PodTech.io Mar 01 '16 at 09:46
15

Not sure if I understand your requirement correct but following inner statement gets the list of the latest post_date for each author and joins these back with the wp_posts table to get a complete record.

SELECT  *
FROM    wp_posts wp
        INNER JOIN (
          SELECT  post_author
                  , MAX(post_date) AS post_date
          FROM    wp_posts
          WHERE   post_status = 'publish'
                  AND post_type = 'post'
          GROUP BY
                  post.author
        ) wpmax ON wpmax.post_author = wp.post_author
                   AND wpmax.post_date = wp.post_date
ORDER BY
        wp.post_date DESC
Lieven Keersmaekers
  • 57,207
  • 13
  • 112
  • 146
  • This worked for me, but only after I added an additional `ORDER BY` to the `SELECT` in the `INNER JOIN` (before that, it wasn't necessarily returning the most recent post from every author). – ACJ Oct 10 '12 at 08:28
12

I think that @edze response is wrong.

In the MySQL manual you can read:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.

Two great references:

Sorry, but I can not comment the @edze response because of my reputation, so I have written a new answer.

aanton
  • 5,572
  • 1
  • 23
  • 15
  • 1
    The @Lieven response is the correct one. But if two posts from the same author has got the same date a final GROUP BY is needed. – aanton May 10 '12 at 06:55
  • Second link is really helpful in understanding. – Nikhil Sahu Sep 30 '16 at 07:19
  • That paragraph is a good find. Since cardinality is the sole purpose of using databases (as opposed to spreadsheets), the non-MAX() rows will never contain the same data in a properly structured database. And if they did, that would necessarily defeat the purpose of isolating the MAX() row, which is to find the OTHER distinct data. So MySQL's explanation is effectively pointless. The entire MAX() row should necessarily be selected. This is clearly a MYSQL error, and I suspect the reason for not fixing it is more of a corporate, rather a logical, reason, since AB no longer owns MYSQL. –  Mar 18 '20 at 21:22
8

Do a GROUP BY after the ORDER BY by wrapping your query with the GROUP BY like this:

SELECT t.* FROM (SELECT * FROM table ORDER BY time DESC) t GROUP BY t.author
11101101b
  • 7,679
  • 2
  • 42
  • 52
  • very inefficient, and can easily be rewritten out by MySQL optimizer – newtover Apr 30 '13 at 20:21
  • 1
    I settled on this because it actually worked and ran fast enough (less than 2 seconds on my DB). @newtover I would like to find a solution that is more efficient, but I've tried many other suggestions and I can't get any of them to return exactly what I need (since I have more complex SELECT and WHERE clauses). Can I take my complex query and stick it in a "MySQL optimizer" to fix it? How would that work? – 11101101b May 01 '13 at 19:22
  • I am sorry, I missed your comment. In practice the approach I describe at [http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group/8757062#8757062] is very efficient, but requires good understanding how it all works. If you have a question about a specific query which needs to be optimized, I can help with a specific answer. Your approach is inefficfient because you actually put everything in a temporary not-indexed table and then do a sequential read with a MySQL-specific hack. It is better not to use hacks at all or use them for efficient solutions. – newtover May 06 '13 at 19:09
  • Thank you @newtover for your help. I have posted one of my "problem" queries as another question here: http://stackoverflow.com/questions/16465448/how-do-you-optimize-a-mysql-query-that-joins-on-itself-and-does-a-custom-group – 11101101b May 09 '13 at 15:30
  • That's a good solution but out of interest, why does this work? – MKC Nov 05 '15 at 18:31
  • This solution may work, but it is not *guaranteed* to work, since as @Husky110 points out in his answer, `GROUP BY` chooses "some" matching row. In practice, MySQL may show some consistency, but relying upon this is dangerous. Albeit more verbose, the solution proposed by @Husky110 is better since it explicitly gets the row with the maximum `wp_posts.post_date`. – rinogo Dec 22 '15 at 19:41
4

it doesn't matter if you order before or after the group-statement, because order means only that 213 goes to 123 or 321 and not more. group by takes only SOME entry per column, not only the latest. I consider you working with subselects here like

SELECT wp_posts.* FROM wp_posts
        WHERE wp_posts.post_status='publish'
        AND wp_posts.post_type='post'
        AND wp_posts.post_date = (Select max(post_date) from wp_posts where author = ... )
Husky110
  • 723
  • 2
  • 10
  • 27
4

What do you think about this?? Seems to work for me

SELECT wp_posts.post_author, MAX(wp_posts.post_date), wp_posts.status, wp_posts.post_type
FROM wp_posts
    WHERE wp_posts.post_status='publish'
    AND wp_posts.post_type='post'
    GROUP BY wp_posts.post_author

It brings me all the Authors with the most updated post_date ... Do you identify a problem there?? I don't

Graham
  • 14,885
  • 4
  • 36
  • 42
Alex
  • 57
  • 1
  • 1
1
    SELECT wp_posts.*,max(wp_posts.post_date) FROM wp_posts
    WHERE wp_posts.post_status='publish'
    AND wp_posts.post_type='post'
    GROUP BY wp_posts.post_author 
  • 3
    Hello! Please make sure to add an explanation to your post; having just a block of code as an answer usually doesn't help the asker understand why an answer is correct (granted, this question had been asked and answered two years ago, so I believe that Tom already has his answer!) – Chris Forrence Apr 24 '13 at 19:28
0

Use the below code...

<?php
//get all users, iterate through users, query for one post for the user,
//if there is a post then display the post title, author, content info
$blogusers = get_users_of_blog();
if ($blogusers) {
  foreach ($blogusers as $bloguser) {
    $args = array(
    'author' => $bloguser->user_id,
      'showposts' => 1,
      'caller_get_posts' => 1
    );
    $my_query = new WP_Query($args);
    if( $my_query->have_posts() ) {
      // $user = get_userdata($bloguser->user_id);
      // echo 'This is one post for author with User ID: ' . $user->ID . ' ' . $user-    >user_firstname . ' ' . $user->user_lastname;
      while ($my_query->have_posts()) : $my_query->the_post(); ?>
        <a href="<?php the_permalink() ?>" rel="bookmark" title="Permanent Link to <?    php the_title_attribute(); ?>"><?php the_title(); ?></a>

        <small><?php the_time('F jS, Y') ?> by <?php the_author_posts_link() ?>     </small><?php
        the_content();
      endwhile;
    }
  }
}
?>
Kausha Mehta
  • 2,828
  • 2
  • 20
  • 31
  • Thanks, I was specifically looking for SQL, not PHP. This question already has a good answer. There are plenty of unanswered PHP questions http://stackoverflow.com/questions/tagged/php?sort=unanswered – Tom Apr 25 '14 at 15:17
0

HERE a simple answer from http://www.cafewebmaster.com/mysql-order-sort-group

SELECT * FROM 
(
select * from `my_table` order by timestamp desc
) as my_table_tmp

GROUP BY catid
ORDER BY nid desc

it worked wonders for me

AbcAeffchen
  • 14,400
  • 15
  • 47
  • 66
0

When our table became large, performance need to checked also. I checked all the options in the questions here, with a PM system with a 136K messages and link table with 83K rows.

When you need only count, or only IDs - Alex's solution is the best.

SELECT wp_posts.post_author, MAX(wp_posts.post_date), wp_posts.status, wp_posts.post_type
FROM wp_posts
    WHERE wp_posts.post_status='publish'
    AND wp_posts.post_type='post'
    GROUP BY wp_posts.post_author

When you need other fields, I need to modify Husky110 solution (to my table design - here it is only example - not checked), that in my tables 10x faster than the subquery option:

SELECT wp_posts.* FROM wp_posts,
    (Select post_id as pid,  max(post_date) maxdate from wp_posts where author = ... group by author order by maxdate  desc limit 4) t
    WHERE wp_posts.post_status='publish'
    AND wp_posts.post_type='post'
    AND wp_posts.post_id = pid

This change can select more than one post (one for user, for example), and can be modified to other solutions.

Moshe.

Moshe L
  • 1,797
  • 14
  • 19