295

There are plenty of similar questions to be found on here but I don't think that any answer the question adequately.

I'll continue from the current most popular question and use their example if that's alright.

The task in this instance is to get the latest post for each author in the database.

The example query produces unusable results as its not always the latest post that is returned.

SELECT wp_posts.* FROM wp_posts
    WHERE wp_posts.post_status='publish'
    AND wp_posts.post_type='post'
    GROUP BY wp_posts.post_author           
    ORDER BY wp_posts.post_date DESC

The current accepted answer is

SELECT
    wp_posts.*
FROM wp_posts
WHERE
    wp_posts.post_status='publish'
    AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author
HAVING wp_posts.post_date = MAX(wp_posts.post_date) <- ONLY THE LAST POST FOR EACH AUTHOR
ORDER BY wp_posts.post_date DESC

Unfortunately this answer is plain and simple wrong and in many cases produces less stable results than the orginal query.

My best solution is to use a subquery of the form

SELECT wp_posts.* FROM 
(
    SELECT * 
    FROM wp_posts
    ORDER BY wp_posts.post_date DESC
) AS wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author 

My question is a simple one then: Is there anyway to order rows before grouping without resorting to a subquery?

Edit: This question was a continuation from another question and the specifics of my situation are slightly different. You can (and should) assume that there is also a wp_posts.id that is a unique identifier for that particular post.

Community
  • 1
  • 1
Rob Forrest
  • 7,329
  • 7
  • 52
  • 69
  • 2
    As you mentioned in the comments to the given answers, it might be possible to have some posts with the same Timestamp. If so, please give an example with data and the expected result. And please describe, why you expect this result. `post_author` and `post_date` are not enough to get a unique row, so there has to be more to get a unique row per `post_author` – Sir Rufo Feb 12 '13 at 14:17
  • @SirRufo You're right, I've added in an edit for you. – Rob Forrest Feb 12 '13 at 16:42
  • `There are plenty of similar questions to be found on here but I don't think that any answer the question adequately.` That's what bounties are for. – Lightness Races in Orbit Feb 12 '13 at 16:43
  • @LightnessRacesinOrbit, if the current question already has an accepted answer that in my opinion is wrong, what would you suggest doing? – Rob Forrest Feb 12 '13 at 16:53
  • @RobForrest: Downvote, leave a comment, and offer a bounty. – Lightness Races in Orbit Feb 12 '13 at 17:32
  • It looks like you're talking about the [groupwise max](http://jan.kneschke.de/projects/mysql/groupwise-max/) problem. Many people have [written about it](http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/), including how to solve it [without using subqueries](http://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-in-sql-without-subqueries/). – TehShrike Feb 15 '13 at 19:29
  • @RobForrest see this http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group/1313293#1313293 – Timo Huovinen Sep 13 '13 at 14:55
  • If you need the two columns only (e.g. ID and its latest timestamp), this might work: http://stackoverflow.com/a/4448536/722036. It's **faster** than using sub-queries on a huge table with millions of rows. – ᴍᴇʜᴏᴠ Jun 26 '16 at 11:39
  • @TheSexiestManinJamaica, The reason it is faster, is because it is wrong. There is no assurance that the `id` returned is related to the `MAX(TIME)`. – Rob Forrest Jun 27 '16 at 07:52
  • 1
    Wondering why you accepted an answer that uses a subquery - when your question clearly asks... " "Is there anyway to order rows before grouping without resorting to a subquery?" ??? – TV-C-1-5 Nov 19 '17 at 21:51
  • @TV-C-15 An answer of "No" wouldn't exactly be of much use now would it? Instead the answer is a better solution to the problem at hand. If you feel that this question and associated answer aren't useful, feel free to down vote and ask your own question. – Rob Forrest Nov 20 '17 at 09:34
  • Yes - "NO" would be very useful - as it is the actual answer and prevents people from wasting their time following answers that DO NOT provide the solution requested. But... if you voted for trump then... I guess non-answers are what you want for sure ;) BUt good point about the answer and my disatisfaction - I think I will formulate my own question towards the actual question being asked here. Thanks :) – TV-C-1-5 Nov 20 '17 at 22:11

13 Answers13

442

Using an ORDER BY in a subquery is not the best solution to this problem.

The best solution to get the max(post_date) by author is to use a subquery to return the max date and then join that to your table on both the post_author and the max date.

The solution should be:

SELECT p1.* 
FROM wp_posts p1
INNER JOIN
(
    SELECT max(post_date) MaxPostDate, post_author
    FROM wp_posts
    WHERE post_status='publish'
       AND post_type='post'
    GROUP BY post_author
) p2
  ON p1.post_author = p2.post_author
  AND p1.post_date = p2.MaxPostDate
WHERE p1.post_status='publish'
  AND p1.post_type='post'
order by p1.post_date desc

If you have the following sample data:

CREATE TABLE wp_posts
    (`id` int, `title` varchar(6), `post_date` datetime, `post_author` varchar(3))
;

INSERT INTO wp_posts
    (`id`, `title`, `post_date`, `post_author`)
VALUES
    (1, 'Title1', '2013-01-01 00:00:00', 'Jim'),
    (2, 'Title2', '2013-02-01 00:00:00', 'Jim')
;

The subquery is going to return the max date and author of:

MaxPostDate | Author
2/1/2013    | Jim

Then since you are joining that back to the table, on both values you will return the full details of that post.

See SQL Fiddle with Demo.

To expand on my comments about using a subquery to accurate return this data.

MySQL does not force you to GROUP BY every column that you include in the SELECT list. As a result, if you only GROUP BY one column but return 10 columns in total, there is no guarantee that the other column values which belong to the post_author that is returned. If the column is not in a GROUP BY MySQL chooses what value should be returned.

Using the subquery with the aggregate function will guarantee that the correct author and post is returned every time.

As a side note, while MySQL allows you to use an ORDER BY in a subquery and allows you to apply a GROUP BY to not every column in the SELECT list this behavior is not allowed in other databases including SQL Server.

Joshua Pinter
  • 45,245
  • 23
  • 243
  • 245
Taryn
  • 242,637
  • 56
  • 362
  • 405
  • 4
    I see what you've done there but that simply returns the date that the most recent post was made, not the entire row for that most recent post. – Rob Forrest Feb 08 '13 at 10:57
  • 1
    @RobForrest that is what the join does. You return the most recent post date in the subquery by author and then join back to your `wp_posts` on both columns to get the full row. – Taryn Feb 08 '13 at 10:58
  • Gotcha! I see that now. In your opinion, which is faster a subquery or a join? – Rob Forrest Feb 08 '13 at 11:01
  • @RobForrest I am not sure I understand your question. This query includes both in order to return the correct result. – Taryn Feb 08 '13 at 11:04
  • Sorry, badly worded there. What makes you say that the subquery and join that you've used is better than the single subquery that I suggested? Is it faster or more accurate in someway? – Rob Forrest Feb 08 '13 at 11:11
  • 7
    @RobForrest For one, when you apply the `GROUP BY` to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an `ORDER BY` in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed. – Taryn Feb 08 '13 at 11:15
  • @RobForrest It's not a matter of efficiency but of correctness. The `( ORDER BY) GROUP BY` method can give wrong results. – ypercubeᵀᴹ Feb 08 '13 at 11:24
  • Thanks bluefeet, that's a cracking answer just there. – Rob Forrest Feb 08 '13 at 11:28
  • What about wp_posts.id? Assuming it's auto incrementing, one could do INNER JOIN (SELECT max(id) as max_id FROM wp_posts [...] GROUP BY post_author) p2 ON p1.id=p2.max_id. Not sure if I'm missing something though. – Bell Sep 26 '13 at 19:55
  • @Bell Yes, you could do that but the OP was specifically asking about most recent date for each author. If you were sure that the id would give the correct date, then you could use that. – Taryn Sep 26 '13 at 20:02
  • @bluefeet I don't think you need that extra ``WHERE p1.post_status='publish' AND p1.post_type='post'`` in 'master' query, since you already filtered result set of sub query and INNER joined it, so only those rows will be in result – dzona Jan 25 '15 at 23:29
  • @dzona You could but are you sure that will return the exact data that you'd need. Example, the inner join filters on date, author but doesn't include the post type - what if the author and date has multiple post types/post statuses for the same date? You'd incorrectly return those rows. – Taryn Jan 26 '15 at 11:47
  • 2
    For scaling, the compound `INDEX(post_author, post_date)` is important. – Rick James Mar 01 '15 at 16:08
  • @bluefeet isn't it true that you can drop that last set of predicate statements? Since the predicates are in the sub query, the inner join will automatically exclude anything that doesn't meet the inner join criteria. – teuber789 Sep 21 '17 at 20:10
  • @jtcotton63 It is possible, but it may also be possible that `p1` has a `post_type` and `post_status` that are different from the inner query but have the same `post_author` and `post_date`. If you're sure that's not possible, then get rid of the `WHERE` clause. I prefer to be specific in the event it is possible. – Taryn Sep 21 '17 at 20:27
  • Ah, that's true. However, if you were to join on p1.post_id = p2.post_id, you wouldn't need the last predicate clause. – teuber789 Sep 21 '17 at 20:30
  • 1
    @jtcotton63 True, but if you put `post_id` in your inner query, then technically you should group by it as well, which would most likely skew your results. – Taryn Sep 21 '17 at 20:34
  • Yes, you're right again. I just hate that we have to duplicate the where clause - I'd prefer to only do it once. – teuber789 Sep 21 '17 at 20:37
  • @jtcotton63 Yup, I totally agree and will trim when I know it won't mess up the results. – Taryn Sep 21 '17 at 20:37
  • How is this the answer - when the question clearly says "Is there anyway to order rows before grouping >>>>>>>without resorting to a subquery<<<<<<" I myself came here to find out how to order a query before grouping .... without a sub-query And what if you need specific date limit instead of a max date? You will still need to know how to order before grouping. Is there any way to do this WITHOUT the subquery? – TV-C-1-5 Nov 19 '17 at 21:19
  • @bluefeet How use limit and offset here ? I mean, first 10 results, on next request, second 10 results. For example, at first request select newest post for an author and at second request select second newest post. – AliN11 Nov 25 '17 at 11:30
  • @AliN11I'd recommend asking a new question, explaining what you need along with some sample data and the final desired result. – Taryn Nov 27 '17 at 13:47
  • Sometimes a good solution is to get SQL to only do the ordering and then do a single pass through your results to implement your own group by. Technically, ordering a full set of results in the same group is more work than finding just the min or max, therefore, this method is not always as performant, but on a small enough set of results it makes no difference. Also, depending on how you write your sub-query, it might be just as good on performance. – Joel M Dec 17 '20 at 01:38
  • What if I want comparison on two columns, `MAX` on two columns will return the MAX individually and the `ON` statement will not be correct Subquery will be needed with `ORDER BY` on these two columns in this situation – Eiad Samman Apr 10 '21 at 14:42
28

Your solution makes use of an extension to GROUP BY clause that permits to group by some fields (in this case, just post_author):

GROUP BY wp_posts.post_author

and select nonaggregated columns:

SELECT wp_posts.*

that are not listed in the group by clause, or that are not used in an aggregate function (MIN, MAX, COUNT, etc.).

Correct use of extension to GROUP BY clause

This is useful when all values of non-aggregated columns are equal for every row.

For example, suppose you have a table GardensFlowers (name of the garden, flower that grows in the garden):

INSERT INTO GardensFlowers VALUES
('Central Park',       'Magnolia'),
('Hyde Park',          'Tulip'),
('Gardens By The Bay', 'Peony'),
('Gardens By The Bay', 'Cherry Blossom');

and you want to extract all the flowers that grows in a garden, where multiple flowers grow. Then you have to use a subquery, for example you could use this:

SELECT GardensFlowers.*
FROM   GardensFlowers
WHERE  name IN (SELECT   name
                FROM     GardensFlowers
                GROUP BY name
                HAVING   COUNT(DISTINCT flower)>1);

If you need to extract all the flowers that are the only flowers in the garder instead, you could just change the HAVING condition to HAVING COUNT(DISTINCT flower)=1, but MySql also allows you to use this:

SELECT   GardensFlowers.*
FROM     GardensFlowers
GROUP BY name
HAVING   COUNT(DISTINCT flower)=1;

no subquery, not standard SQL, but simpler.

Incorrect use of extension to GROUP BY clause

But what happens if you SELECT non-aggregated columns that are non equal for every row? Which is the value that MySql chooses for that column?

It looks like MySql always chooses the FIRST value it encounters.

To make sure that the first value it encounters is exactly the value you want, you need to apply a GROUP BY to an ordered query, hence the need to use a subquery. You can't do it otherwise.

Given the assumption that MySql always chooses the first row it encounters, you are correcly sorting the rows before the GROUP BY. But unfortunately, if you read the documentation carefully, you'll notice that this assumption is not true.

When selecting non-aggregated columns that are not always the same, MySql is free to choose any value, so the resulting value that it actually shows is indeterminate.

I see that this trick to get the first value of a non-aggregated column is used a lot, and it usually/almost always works, I use it as well sometimes (at my own risk). But since it's not documented, you can't rely on this behaviour.

This link (thanks ypercube!) GROUP BY trick has been optimized away shows a situation in which the same query returns different results between MySql and MariaDB, probably because of a different optimization engine.

So, if this trick works, it's just a matter of luck.

The accepted answer on the other question looks wrong to me:

HAVING wp_posts.post_date = MAX(wp_posts.post_date)

wp_posts.post_date is a non-aggregated column, and its value will be officially undetermined, but it will likely be the first post_date encountered. But since the GROUP BY trick is applied to an unordered table, it is not sure which is the first post_date encountered.

It will probably returns posts that are the only posts of a single author, but even this is not always certain.

A possible solution

I think that this could be a possible solution:

SELECT wp_posts.*
FROM   wp_posts
WHERE  id IN (
  SELECT max(id)
  FROM wp_posts
  WHERE (post_author, post_date) = (
    SELECT   post_author, max(post_date)
    FROM     wp_posts
    WHERE    wp_posts.post_status='publish'
             AND wp_posts.post_type='post'
    GROUP BY post_author
  ) AND wp_posts.post_status='publish'
    AND wp_posts.post_type='post'
  GROUP BY post_author
)

On the inner query I'm returning the maximum post date for every author. I'm then taking into consideration the fact that the same author could theorically have two posts at the same time, so I'm getting only the maximum ID. And then I'm returning all rows that have those maximum IDs. It could be made faster using joins instead of IN clause.

(If you're sure that ID is only increasing, and if ID1 > ID2 also means that post_date1 > post_date2, then the query could be made much more simple, but I'm not sure if this is the case).

Community
  • 1
  • 1
fthiella
  • 48,073
  • 15
  • 90
  • 106
  • That `extension to GROUP By` is an interesting read, thanks for that. – Rob Forrest Feb 08 '13 at 11:24
  • 2
    An example where it fails: [GROUP BY trick has been optimized away](https://kb.askmonty.org/en/group-by-trick-has-been-optimized-away/) – ypercubeᵀᴹ Feb 08 '13 at 11:39
  • 1
    Nonaggregated columns in select expressions with GROUP BY no longer works by default with MySQL 5.7: https://stackoverflow.com/questions/34115174/error-related-to-only-full-group-by-when-executing-a-query-in-mysql. Which IMHO is much safer and forces some people to write more efficient queries. – rink.attendant.6 Mar 11 '17 at 20:21
  • Doesnt this answer use a subquery? Isn't the Original Poster asking for a solution that does NOT use a subquery? – TV-C-1-5 Nov 19 '17 at 21:26
  • @TV-C-15 yes I am using a subquery, but the problem is not the subquery, it's the *resorting* of a subquery which is incorrect and should be avoided – fthiella Nov 19 '17 at 21:30
  • I see... But the question asks "Is there anyway to order rows before grouping without resorting to a subquery?' ?? This is not really an answer to the question then. – TV-C-1-5 Nov 19 '17 at 21:49
  • 1
    @TV-C-15 the problem is with the resorting of the subquery, and i am explaining why resorting of a subquery won't work. Even the accepted answer uses a subquery but it starts explaining why resorting is a bad idea (*Using an ORDER BY in a subquery is not the best solution to this problem*) – fthiella Nov 19 '17 at 21:52
  • Yes - I see that is clear. It is also clear that - the answer uses a subquery with a max statement instead. Thereby - still using a subquery. The question calls for a solution that does NOT use a sub query. The question does not ask - what else can I use in a subquery that is better than sort. The question asks " Is there anyway to order rows before grouping without resorting to a subquery?" – TV-C-1-5 Nov 19 '17 at 21:58
  • @TV-C-15 yes. the only difference is that the subquery the asker is using is wrong, while the subquery i am using is correct. It's like if he asked: i don't want to use technology X because there's a problem Y, what's the best solution? And I said the best solution is still technology X but with a different approach, because your problem Y is caused by something else. – fthiella Nov 19 '17 at 22:02
  • I am sorry - but you are wrong. It is not "LIKE" He asked something... He asked very specifically... "Is there anyway to order rows before grouping >without resorting to a subquery" Therefore - any answer that uses a subquery - is not answering the question. The correct answer should not use a subquery - or perhaps there truly NOT a way to do this WITHOUT using a subquery??? – TV-C-1-5 Nov 19 '17 at 22:10
  • there's a way with LEFT JOIN, and there are plenty of other questions and answers that covers that, but it's not necessarily more performant or appropriate. On this specific context it was more useful and interesting to explain what was wrong with the resorting of the subquery rather than answering literally. I think the op appreciated this answer. – fthiella Nov 19 '17 at 22:25
  • This solution is crashing mysql server / service. I have checked it with 10 million records, not recommend this solution. Using `IN` in this case is very worst. – Kamlesh Sep 17 '21 at 12:29
11

What you are going to read is rather hacky, so don't try this at home!

In SQL in general the answer to your question is NO, but because of the relaxed mode of the GROUP BY (mentioned by @bluefeet), the answer is YES in MySQL.

Suppose, you have a BTREE index on (post_status, post_type, post_author, post_date). How does the index look like under the hood?

(post_status='publish', post_type='post', post_author='user A', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-31') (post_status='publish', post_type='post', post_author='user B', post_date='2012-10-01') (post_status='publish', post_type='post', post_author='user B', post_date='2012-12-01')

That is data is sorted by all those fields in ascending order.

When you are doing a GROUP BY by default it sorts data by the grouping field (post_author, in our case; post_status, post_type are required by the WHERE clause) and if there is a matching index, it takes data for each first record in ascending order. That is the query will fetch the following (the first post for each user):

(post_status='publish', post_type='post', post_author='user A', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user B', post_date='2012-10-01')

But GROUP BY in MySQL allows you to specify the order explicitly. And when you request post_user in descending order, it will walk through our index in the opposite order, still taking the first record for each group which is actually last.

That is

...
WHERE wp_posts.post_status='publish' AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author DESC

will give us

(post_status='publish', post_type='post', post_author='user B', post_date='2012-12-01') (post_status='publish', post_type='post', post_author='user A', post_date='2012-12-31')

Now, when you order the results of the grouping by post_date, you get the data you wanted.

SELECT wp_posts.*
FROM wp_posts
WHERE wp_posts.post_status='publish' AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author DESC
ORDER BY wp_posts.post_date DESC;

NB:

This is not what I would recommend for this particular query. In this case, I would use a slightly modified version of what @bluefeet suggests. But this technique might be very useful. Take a look at my answer here: Retrieving the last record in each group

Pitfalls: The disadvantages of the approach is that

  • the result of the query depends on the index, which is against the spirit of the SQL (indexes should only speed up queries);
  • index does not know anything about its influence on the query (you or someone else in future might find the index too resource-consuming and change it somehow, breaking the query results, not only its performance)
  • if you do not understand how the query works, most probably you'll forget the explanation in a month and the query will confuse you and your colleagues.

The advantage is performance in hard cases. In this case, the performance of the query should be the same as in @bluefeet's query, because of amount of data involved in sorting (all data is loaded into a temporary table and then sorted; btw, his query requires the (post_status, post_type, post_author, post_date) index as well).

What I would suggest:

As I said, those queries make MySQL waste time sorting potentially huge amounts of data in a temporary table. In case you need paging (that is LIMIT is involved) most of the data is even thrown off. What I would do is minimize the amount of sorted data: that is sort and limit a minimum of data in the subquery and then join back to the whole table.

SELECT * 
FROM wp_posts
INNER JOIN
(
  SELECT max(post_date) post_date, post_author
  FROM wp_posts
  WHERE post_status='publish' AND post_type='post'
  GROUP BY post_author
  ORDER BY post_date DESC
  -- LIMIT GOES HERE
) p2 USING (post_author, post_date)
WHERE post_status='publish' AND post_type='post';

The same query using the approach described above:

SELECT *
FROM (
  SELECT post_id
  FROM wp_posts
  WHERE post_status='publish' AND post_type='post'
  GROUP BY post_author DESC
  ORDER BY post_date DESC
  -- LIMIT GOES HERE
) as ids
JOIN wp_posts USING (post_id);

All those queries with their execution plans on SQLFiddle.

Community
  • 1
  • 1
newtover
  • 31,286
  • 11
  • 84
  • 89
  • That's an interesting technique you've got going there. Two things: you say don't try this at home, what are the potential pitfalls? secondly, you mention a slightly modified version of bluefeet's answer, what would that be? – Rob Forrest Feb 12 '13 at 16:51
  • Thanks for that, its interesting to see someone attacking the problem a different way. As my data set is nowhere near your 18M+ rows, I don't think performance is as crucial as maintainability so I think your later options are probably more suitable. I like the idea of the limit on the inside of the subquery. – Rob Forrest Feb 13 '13 at 09:24
9

Try this one. Just get the list of latest post dates from each author. Thats it

SELECT wp_posts.* FROM wp_posts WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post' AND wp_posts.post_date IN(SELECT MAX(wp_posts.post_date) FROM wp_posts GROUP BY wp_posts.post_author) 
sanchitkhanna26
  • 2,143
  • 8
  • 28
  • 42
  • @Rob Forrest , check my solution. It resolves your question, hopefully! – sanchitkhanna26 Feb 08 '13 at 11:10
  • 1
    I'm sorry, I don't think that would work. For example if both author 1 and author 2 publishes something on 01/02/13 and then author 2 posts something new on the 08/02/13, all 3 posts will be returned. Yes the datetime field includes the time so the situation is less likely but by no means is it guaranteed on a large enough dataset. – Rob Forrest Feb 08 '13 at 11:16
  • +1 for using the `post_date IN (select max(...) ...)`. This is more efficient than doing a group by in a sub select, see https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html – Seaux Mar 12 '14 at 19:07
  • just to clarify, that is only more optimal if you have post_author indexed. – Seaux Mar 12 '14 at 19:12
  • 1
    `IN ( SELECT ... )` is much less efficient than the equivalent JOIN. – Rick James Mar 01 '15 at 16:07
5

Just use the max function and group function

    select max(taskhistory.id) as id from taskhistory
            group by taskhistory.taskid
            order by taskhistory.datum desc
  • 3
    What if the one with the highest id isn't the most recently posted? An example of this could be that the author held his post in draft for a long period of time before posting it. – Rob Forrest Nov 17 '16 at 08:45
2

No. It makes no sense to order the records before grouping, since grouping is going to mutate the result set. The subquery way is the preferred way. If this is going too slow you would have to change your table design, for example by storing the id of of the last post for each author in a seperate table, or introduce a boolean column indicating for each author which of his post is the last one.

Dennisch
  • 6,888
  • 1
  • 17
  • 32
  • Dennish, how would you respond to Bluefeet's comments that this type of query is not correct SQL syntax and therefore not portable across database platforms? There are also concerns that there is no guarantee that this would produce the correct results every time. – Rob Forrest Feb 08 '13 at 11:35
1

Not sure whether this has already been suggested or not, but you might now use SQL window functions :

SELECT * FROM (
      SELECT wp_posts.*, ROW_NUMBER() OVER (PARTITION BY wp_posts.post_author ORDER BY post_date DESC) rank
      FROM wp_posts
      WHERE wp_posts.post_status = 'publish'
      AND wp_posts.post_type = 'post'
  ) AS T
WHERE rank = 1

All rows gets "ranked", then you just need to select each 1st rows.

I admit I do not have any clue about performance but as far I as know it should be quite acceptable.

nicolas
  • 712
  • 4
  • 15
0

Just to recap, the standard solution uses an uncorrelated subquery and looks like this:

SELECT x.*
  FROM my_table x
  JOIN (SELECT grouping_criteria,MAX(ranking_criterion) max_n FROM my_table GROUP BY grouping_criteria) y
    ON y.grouping_criteria = x.grouping_criteria
   AND y.max_n = x.ranking_criterion;

If you're using an ancient version of MySQL, or a fairly small data set, then you can use the following method:

SELECT x.*
  FROM my_table x
  LEFT
  JOIN my_table y
    ON y.joining_criteria = x.joining_criteria
   AND y.ranking_criteria < x.ranking_criteria
 WHERE y.some_non_null_column IS NULL;  
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • When you say ancient version, what version of MySQL would this run on? And sorry no, the dataset is quite large in my example. – Rob Forrest Feb 08 '13 at 11:31
  • It will work (slowly) on any version. Older versions cannot use subqueries. – Strawberry Feb 08 '13 at 12:37
  • Yep, the method #2 (the version I've tried is from [here](http://stackoverflow.com/a/1066503/722036)) won't work on a large dataset (millions of rows), throws a *lost connection* error. Method #1 takes ~15 seconds to execute a query. I initially wanted to avoid using nested queries, but this made me reconsider. Thank you! – ᴍᴇʜᴏᴠ Jun 26 '16 at 11:06
  • @TheSexiestManinJamaica Yes. Not much has changed in 3.5 years. Assuming a query is in itself efficient, then the time that query takes to execute is largely dependent upon the size of the dataset, the arrangement of indexes, and available hardware. – Strawberry Jun 26 '16 at 11:11
0

Just in case if it is actual. I did many times something like this:

select * from 
  (select max(some_quantity) over (partition by id1, id2) as max_quantity, t.*
  from table_name t) tt 
where tt.max_quantity=tt.some_quantity;

This is grouping having the condition maximum of the field some_quantity.

Peter Csala
  • 17,736
  • 16
  • 35
  • 75
0

Here is my solution with User-Defined Variable to get consistent results and even without GROUP BY. The goal was to get the whole row, not just the maximum value of one cell in a row. See example below:

SET @product_id := 0;

SELECT
    products.order_code,
    purchases.`date`,
    purchases.price
FROM products
LEFT JOIN (       
    SELECT
        purchases.`date`,
        purchases.price,
        IF(@product_id = purchases.product_id, 0, 1) AS is_last,
        @product_id := purchases.product_id AS product_id
    FROM purchases
    ORDER BY purchases.product_id ASC, purchases.id DESC
) purchases ON products.id = purchases.product_id
WHERE purchases.is_last = 1
ORDER BY products.order_code ASC;

I am not sure about the performance, but it was 0.1 sec on 50000 rows of purchases table. Please let me know if i can make any performance improvements.

pj.cz
  • 1
  • Thanks for the contribution but it doesn't really answer the question posed. As for performance, you could do away with the `purchases.is_last` business and insert `where purchases.product_id = @product_id` in the subquery? That would result in fewer rows for the left join to operate on. – Rob Forrest Feb 14 '22 at 12:31
0

You can try

SELECT wp_posts.*, MIN(DISTINCT wp_posts.post_date) FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author

the Min function is grouping and ordering in this case

-1

** Sub queries may have a bad impact on performance when used with large datasets **

Original query

SELECT wp_posts.*
FROM   wp_posts
WHERE  wp_posts.post_status = 'publish'
       AND wp_posts.post_type = 'post'
GROUP  BY wp_posts.post_author
ORDER  BY wp_posts.post_date DESC; 

Modified query

SELECT p.post_status,
       p.post_type,
       Max(p.post_date),
       p.post_author
FROM   wp_posts P
WHERE  p.post_status = "publish"
       AND p.post_type = "post"
GROUP  BY p.post_author
ORDER  BY p.post_date; 

becasue i'm using max in the select clause ==> max(p.post_date) it is possible to avoid sub select queries and order by the max column after the group by.

Pரதீப்
  • 91,748
  • 19
  • 131
  • 172
guykaplan
  • 145
  • 1
  • 3
  • 1
    This indeed returns the most recent post_date per author but there is no guarantee that the rest of the data that is returned relates to the post with the most recent post_date. – Rob Forrest Nov 05 '14 at 16:17
  • @RobForrest -> I don't understand why? it is a good idea to elaborate your answer and just throw out claims. As far as I understand the data is guaranteed to be related as I use where clause to filter the related data. – guykaplan Nov 06 '14 at 09:14
  • 1
    To an extent, you are entirely correct, each of the 4 fields you're selecting will relate to that maximum post_date, but this doesn't answer the question that was asked. For instance, if you added the post_id, or the contents of the post, then those columns would not be assured to be from the same record as the maximum date. To get your query above to return the rest of the post's details you would have to run a second query. If the question was about finding the date of the most recent post, then yes you're answer would be fine. – Rob Forrest Nov 06 '14 at 13:08
  • @guykaplan, Subqueries are not slow. The size of the data set doesn't matter. It depends on how you use it. See http://www.percona.com/blog/2010/03/18/when-the-subselect-runs-faster/ – Pacerier Apr 09 '15 at 14:24
  • @Pacerier: the article indeed shows how you can get performance benefit from sub-queries, but I would love to see you convert the given scenario to perform better. and Data size is important, again in the given article you posted you are assuming there is only one table to work with. data size is not by row size, is by complexity size. having said that, if you are working with really large table (not many tables involved) sub-query may perform much better. – guykaplan Jun 01 '15 at 14:12
-4

First, don't use * in select, affects their performance and hinder the use of the group by and order by. Try this query:

SELECT wp_posts.post_author, wp_posts.post_date as pdate FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author           
ORDER BY pdate DESC

When you don't specifies the table in ORDER BY, just the alias, they will order the result of the select.

Bruno Nardini
  • 461
  • 5
  • 13