Your solution makes use of an extension to GROUP BY clause that permits to group by some fields (in this case, just post_author
):
GROUP BY wp_posts.post_author
and select nonaggregated columns:
SELECT wp_posts.*
that are not listed in the group by clause, or that are not used in an aggregate function (MIN, MAX, COUNT, etc.).
Correct use of extension to GROUP BY clause
This is useful when all values of non-aggregated columns are equal for every row.
For example, suppose you have a table GardensFlowers
(name
of the garden, flower
that grows in the garden):
INSERT INTO GardensFlowers VALUES
('Central Park', 'Magnolia'),
('Hyde Park', 'Tulip'),
('Gardens By The Bay', 'Peony'),
('Gardens By The Bay', 'Cherry Blossom');
and you want to extract all the flowers that grows in a garden, where multiple flowers grow. Then you have to use a subquery, for example you could use this:
SELECT GardensFlowers.*
FROM GardensFlowers
WHERE name IN (SELECT name
FROM GardensFlowers
GROUP BY name
HAVING COUNT(DISTINCT flower)>1);
If you need to extract all the flowers that are the only flowers in the garder instead, you could just change the HAVING condition to HAVING COUNT(DISTINCT flower)=1
, but MySql also allows you to use this:
SELECT GardensFlowers.*
FROM GardensFlowers
GROUP BY name
HAVING COUNT(DISTINCT flower)=1;
no subquery, not standard SQL, but simpler.
Incorrect use of extension to GROUP BY clause
But what happens if you SELECT non-aggregated columns that are non equal for every row? Which is the value that MySql chooses for that column?
It looks like MySql always chooses the FIRST value it encounters.
To make sure that the first value it encounters is exactly the value you want, you need to apply a GROUP BY
to an ordered query, hence the need to use a subquery. You can't do it otherwise.
Given the assumption that MySql always chooses the first row it encounters, you are correcly sorting the rows before the GROUP BY. But unfortunately, if you read the documentation carefully, you'll notice that this assumption is not true.
When selecting non-aggregated columns that are not always the same, MySql is free to choose any value, so the resulting value that it actually shows is indeterminate.
I see that this trick to get the first value of a non-aggregated column is used a lot, and it usually/almost always works, I use it as well sometimes (at my own risk). But since it's not documented, you can't rely on this behaviour.
This link (thanks ypercube!) GROUP BY trick has been optimized away shows a situation in which the same query returns different results between MySql and MariaDB, probably because of a different optimization engine.
So, if this trick works, it's just a matter of luck.
The accepted answer on the other question looks wrong to me:
HAVING wp_posts.post_date = MAX(wp_posts.post_date)
wp_posts.post_date
is a non-aggregated column, and its value will be officially undetermined, but it will likely be the first post_date
encountered. But since the GROUP BY trick is applied to an unordered table, it is not sure which is the first post_date
encountered.
It will probably returns posts that are the only posts of a single author, but even this is not always certain.
A possible solution
I think that this could be a possible solution:
SELECT wp_posts.*
FROM wp_posts
WHERE id IN (
SELECT max(id)
FROM wp_posts
WHERE (post_author, post_date) = (
SELECT post_author, max(post_date)
FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY post_author
) AND wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY post_author
)
On the inner query I'm returning the maximum post date for every author. I'm then taking into consideration the fact that the same author could theorically have two posts at the same time, so I'm getting only the maximum ID. And then I'm returning all rows that have those maximum IDs. It could be made faster using joins instead of IN clause.
(If you're sure that ID
is only increasing, and if ID1 > ID2
also means that post_date1 > post_date2
, then the query could be made much more simple, but I'm not sure if this is the case).