1

I need a unique list of parent_threads based on the desc order of postID, postID is always unique but often the parent_thread field is the same for multiple posts.

So what i need is a list of posts in order they were replied to.

so for example in the image below i need to disregard posts 400 and 399 as they're repeats. i've got a query to work using a subquery but the problem with this subquery is that it can sometimes take up to 1 second to query, i was wondering if there was a more efficient way to do this. i've tried group by and distinct but keep getting the wrong results.

imge of the table

Here is the query that i have which produces the results i want, which is often slow.

SELECT `postID`
FROM `posts`
ORDER BY
(
    SELECT MAX(`postID`)
    FROM `posts` `sub`
    WHERE `sub`.`parent_thread` = `posts`.postID
)
DESC
O. Jones
  • 103,626
  • 17
  • 118
  • 172
foofighta
  • 13
  • 2
  • 1
    Welcome to SO. Please see: [Why should I provide an MCRE for what seems to me to be a very simple SQL query?](https://meta.stackoverflow.com/questions/333952/why-should-i-provide-a-minimal-reproducible-example-for-a-very-simple-sql-query) – Strawberry Oct 22 '20 at 12:14
  • Can you provide an example of the desired result – Fuzzy Oct 22 '20 at 12:25
  • The query looks fine to me. You should have an index on `(parent_thread, postid)` to get quickly to the last sub postid per parent post. – Thorsten Kettner Oct 22 '20 at 12:45

1 Answers1

0

Your subquery is known as a dependent subquery. They can make queries very slow because they get repeated a lot.

JOIN to your subquery instead. That way it will be used just once, and things will speed up. Try this subquery to generate a list of max post ids, one for each parent thread.

                   SELECT MAX(postID) maxPostID, parent_thread
                     FROM posts
                    GROUP BY parent_thread

Then use it in your main query like this

SELECT posts.postID
FROM posts
LEFT JOIN (
                       SELECT MAX(postID) maxPostID, parent_thread
                         FROM posts
                        GROUP BY parent_thread
     ) m ON posts.postID = m.parent_thread
ORDER BY m.maxPostID DESC
O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • This query is slightly different from the original in that point that you are only selecting posts that have sub posts. Make this a `LEFT JOIN` to include posts that don't (yet) have sub posts. As your answer has been accepted, it seems it solved the performance issues. I'd be interested in whether the same is true for the query with an outer join. (Because ideally the optimizer should come up with the same execution plan then.) If only posts with sub posts shall be considered and only their IDs are to be selected, then joining the posts table is superfluous. – Thorsten Kettner Oct 22 '20 at 14:04