SQL query checking columns in first table or looking up rows in second table

Question

So I have the following situation:

I have a post model which always has 2 or 4 arguments

class Post < ActiveRecord::Base
  # has two columns, status and published (boolean)
  has_many :arguments
end

and an argument model

class Argument < ActiveRecord::Base
  # has moderated column (boolean) and text
  belongs_to :post
end

and I wonder if I can write a query to get posts with following conditions:

1) the post has unmoderated arguments and first of those, ordered by id (not updated_at) belongs to user with id = 5 (for instance)

------ OR ------

2) the post status = 1 and published = true

PS.: I am using PostgreSQL.

score 1 · Accepted Answer · edited May 23 '17 at 12:28

1

Translated to SQL:

SELECT p.*
FROM   posts p
LEFT   JOIN (
   SELECT DISTINCT ON (post_id)
          post_id, user_id
   FROM   arguments
   WHERE  moderated = FALSE
   ORDER  BY post_id, id
   ) a ON a.post_id = p.id AND a.user_id = 5
WHERE  a.post_id IS NOT NULL OR        -- condition 1
      (a.status = 1 and a.published);  -- condition 2

Details for DISTINCT ON:

Select first row in each GROUP BY group?

Why the `LEFT JOIN`?

If there was just condition 1), we could simplify:

SELECT p.*
FROM   posts p
JOIN  (
   SELECT DISTINCT ...
   ) a ON a.post_id = p.id AND a.user_id = 5;

But you added the alternative (not additional) condition 2). If we would use a plain [INNER] JOIN, posts not passing condition 1) would be dropped from the selection immediately and never get their second chance. I added redundant parentheses (operator precedence would work for us without parentheses) and a line break to point out that this is "the other way in":

(a.status = 1 and a.published)

Of course, we have to add the check for condition 1) now:

a.post_id IS NOT NULL

This works without duplicating rows because the subquery returns exactly 1 or 0 rows per post (and not more).

This is one way to solve it. Should be an excellent choice for only few attributes per post (2 - 4) and a non-trivial number of qualifying attributes. Depending on your actual data distribution there may be other, faster query techniques:

Optimize GROUP BY query to retrieve latest record per user

edited May 23 '17 at 12:28

Community

1
1

answered Mar 30 '15 at 14:47

Erwin Brandstetter

605,456
145
1,078
1,228

I've changed it a bit and after a little testing, I made it work :) Thank you so much for your time and help and I apologize for a lousy written question. The only thing I want to point out is that I needed to use tables' names in plural form, like 'FROM posts p' and the same with the arguments table – NoDisplayName Mar 30 '15 at 15:11
@JustMichael: That seems to be the Ruby naming convention. Personally, I prefer singular terms for tables. the plural adds nothing but longer names. (I also wouldn't use the non-descriptive `id` as column name, like some ORMs do.) – Erwin Brandstetter Mar 30 '15 at 15:31
Yeah it's a rails convention and thanks for explaining – NoDisplayName Mar 31 '15 at 02:29
it works as expected but I wanted to ask one more thing, I've read some info saying that left join returns left records even if the right ones are null so I wonder if this query can return posts that have no joined arguments? Or what am I missing? – NoDisplayName Mar 31 '15 at 03:08
@JustMichael: What you have heard about `LEFT JOIN` is exactly right. I added a rationale. – Erwin Brandstetter Mar 31 '15 at 03:31

SQL query checking columns in first table or looking up rows in second table

1 Answers1

Why the LEFT JOIN?

Why the `LEFT JOIN`?