1

I've been searching for a way to correct my queries after the last MySQL update. I understand the Group By column isn't unique but I can't figure out how to update the query to comply with the strict rule and still retain the result I'm looking for. Does anyone have a suggestion?

SELECT 
    $SQLTable_Posts.`post_id`, 
    $SQLTable_Posts.`profile_id`, 
    $SQLTable_Posts.`post_title`, 
    $SQLTable_Posts.`post_slug`, 
    $SQLTable_Posts.`post_date`
FROM $SQLTable_Posts
WHERE $SQLTable_Posts.`post_date` > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY $SQLTable_Posts.`profile_id`
ORDER BY $SQLTable_Posts.`post_date` DESC

I would like to simply return the latest post made by each profile. For example, my database looks like this;

<table><tbody><tr><th>post_id</th><th>profile_id post_date</th><th> </th></tr><tr><td>0001</td><td>0001</td><td>2019-03-12 00:00:00</td></tr><tr><td>0002</td><td>0002</td><td>2019-03-12 00:00:00</td></tr><tr><td>0003</td><td>0001</td><td>2019-03-12 01:00:00</td></tr><tr><td>0004</td><td>0004</td><td>2019-03-12 00:00:00</td></tr></tbody></table>

I would like the results to return this;

<table><tbody><tr><th>post_id</th><th>profile_id post_date</th><th> </th></tr><tr><td>0003</td><td>0001</td><td>2019-03-12 01:00:00</td></tr><tr><td>0002</td><td>0002</td><td>2019-03-12 00:00:00</td></tr><tr><td>0004</td><td>0004</td><td>2019-03-12 00:00:00</td></tr></tbody></table>
GMB
  • 216,147
  • 25
  • 84
  • 135
epool
  • 266
  • 1
  • 13
  • 1
    what is the result you're looking for? give us an example – Sindhara Mar 12 '19 at 21:19
  • Every column that you SELECT should either be aggregated (COUNT, SUM, etc) or in your GROUP BY clause. – Patrick Q Mar 12 '19 at 21:24
  • and how should sql know if it should take post_id 0003 or 0001 for profile_id 0001 ? and which should be chosen? – Sindhara Mar 12 '19 at 21:31
  • @kun-chan I get what your saying and appreciate you making me think. Previously because I sorted by the date and grouped by the user it would grab the latest, however I do not see any documentation explaining how to tell sql which one to grab first. – epool Mar 12 '19 at 21:34
  • If `profile_id` is the primary key you shouldn't need to do anything special, since all the other columns are functionally dependent on it. – Barmar Mar 12 '19 at 21:45

1 Answers1

1

I would like to simply return the latest post made by each profile.

One way to do this without messing around with GROUP BY is to use a correlated subquery with a NOT EXISTS condition to ensure that each record being selected is the most recent in each group of record having the same profile_id (I used column post_date to sort the records):

SELECT 
    `post_id`, 
    `profile_id`, 
    `post_title`, 
    `post_slug`, 
    `post_date`
FROM $SQLTable_Posts p
WHERE 
    `post_date` > DATE_SUB(NOW(), INTERVAL 24 HOUR)
    AND NOT EXISTS (
        SELECT 1
        FROM $SQLTable_Posts p1
        WHERE p1.`profile_id` = p.`profile_id` AND p1.`post_date` > p.`post_date`
    )
ORDER BY `post_date` DESC

NB: I simplified the query to remove the lengthy table prefixes, which are not needed here (and/or can be replaced with short table aliases).


If you happen to be using MySQL 8.0, this is straight-forward using ROW_NUMBER():

SELECT *
FROM (
    SELECT 
        `post_id`, 
        `profile_id`, 
        `post_title`, 
        `post_slug`, 
        `post_date`,
        ROW_NUMBER() OVER(PARTITION BY `profile_id` ORDER BY `post_date` DESC) rn
    FROM $SQLTable_Posts p
    WHERE `post_date` > DATE_SUB(NOW(), INTERVAL 24 HOUR)
) x WHERE rn = 1
ORDER BY `post_date` DESC
GMB
  • 216,147
  • 25
  • 84
  • 135
  • This solution worked perfectly thank you. However, I just noticed that if there is two posts with the exact same timestamp they both show up. We should update this to filter those as well. – epool Mar 12 '19 at 23:31
  • @epool: ok, for this we need to adapt the query a little. Which version of the query are your using, the one with `NOT EXISTS` or the one with `ROW_NUMBER()`? – GMB Mar 12 '19 at 23:33
  • The one with NOT EXISTS – epool Mar 12 '19 at 23:34
  • 1
    OK so you can change the `WHERE` clause of the subquery to: `WHERE p1.profile_id = p.profile_id AND (p1.post_date > p.post_date OR (p1.post_date = p.post_date AND p1.post_id > p.post_id)`. When there are posts with the same timestamp, the one with the greatest `id` will be selected. – GMB Mar 12 '19 at 23:37
  • 1
    Flawless! Excellent solution. – epool Mar 12 '19 at 23:46