tl;dr - lots of accepted stackoverflow answers suggest using a subquery to affect the row returned by a GROUP BY
clause. While this works, is it the best advice?
I understand there are many questions already about how to retrieve a specific row in a GROUP BY
statement. Most of them revolve around using a subquery in the FROM
clause. The subquery will order the table appropriately and the group by will be run against the now-ordered temporary table. Some examples,
PostgreSQL removes the need for the subquery with the distinct on()
clause.
However, what I'm not understanding in any of these cases is how badly I'm shooting myself in the foot trying to do something the system may not have originally been designed for. Take the following two examples in PostgreSQL and MySQL,
In both cases I have a table of posts that contain multiple versions of the same post (signified by its UUID). I want to select the most recently published
version of each post ordered by it's created_at
field.
My biggest concern is that given the MySQL approach a temporary table is necessary. Ratchet this up to "web scale" (lolz) and I'm wondering if I'm in for a world of hurt. Should I rethink my schema or are there ways to optimize the subquery-parentquery relationship enough that it'll be alright?