Find rows in relation with at least n rows in a different table without joins

Question

I have a table as such (tbl):

+----+------+-----+
| pk | attr | val |
+----+------+-----+
|  0 | ohif |   4 |
|  1 | foha |  56 |
|  2 | slns |   2 |
|  3 | faso |  11 |
+----+------+-----+

And another table in n-to-1 relationship with tbl (tbl2):

+----+-----+
| pk | rel |
+----+-----+
|  0 |   0 |
|  1 |   1 |
|  2 |   0 |
|  3 |   2 |
|  4 |   2 |
|  5 |   3 |
|  6 |   1 |
|  7 |   2 |
+----+-----+

(tbl2.rel -> tbl.pk.)

I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.

I.e., for n = 2, I want this table:

+----+------+-----+
| pk | attr | val |
+----+------+-----+
|  0 | ohif |   4 |
|  1 | foha |  56 |
|  2 | slns |   2 |
+----+------+-----+

This is the solution I came up with:

SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
    SELECT tbl.pk
    FROM tbl
    RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
    GROUP BY tbl.pk
    HAVING COUNT(tbl2.*) >= 2  -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk

Can it be done without selecting the candidates with a subquery and re-joining the table with itself?

I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.

@philipxy I find it ironic that in asking me to be particular you are being so obviously generic: the problem is stated precisely, in a general form, and accompanied by a fully working solution. I though, apparently mistakenly, that specifically asking for a solution not involving `join`s, would suffice to show that I _did_ research the problem before posting the question, and that this is not a "please do the query for me" faq but an optimization question. Which is also the reason why the question you linked is not a duplicate. — rrrrrrrrrrrrrrrr, Jun 05 '19 at 09:02
Since the question has been marked as possible duplicate, I edited it. I hope it's clearer now. — rrrrrrrrrrrrrrrr, Jun 05 '19 at 09:08
I was biased by your (original) title. But there are a pile of duplicates for your closing question & anyway answers for that question are answers for that title--whatever you researched, there are still a lot of easily found duplicates. PS For other ways to write subqueries via joins & joins via subqueries, there are more faqs. — philipxy, Jun 05 '19 at 09:25
@philipxy You have a point on the title, which I have corrected. I still cannot find a duplicate to my question. — rrrrrrrrrrrrrrrr, Jun 05 '19 at 09:28
The duplicate link I gave was to the wrong post. Now googling (say) 'site:stackoverflow.com sql Find rows with at least n rows in another table' ... first hit: — philipxy, Jun 05 '19 at 09:52
Possible duplicate of [SQL: Select rows with a column value that occurs at least N times?](https://stackoverflow.com/questions/5760335/sql-select-rows-with-a-column-value-that-occurs-at-least-n-times) — philipxy, Jun 05 '19 at 09:54
You might be interested in [this re "self-join" & relational querying](https://stackoverflow.com/a/37384306/3404097). All queries like yours for fixed n can be phrased by the 1st table joined to an n-ary self-join of the 2nd table because you want rows that are in the table & for which there exist n rows matching in the 2nd. So you don't need to self-join the 1st. Instead of the join + self joins you can join the 2 & group & count matches. PS SQL WHERE-IN-subquery is just a way to phrase a join--it doesn't meaningfully avoid joining. PS [2nd hit](https://stackoverflow.com/q/47309139/3404097) — philipxy, Jun 05 '19 at 10:04

Shawn.X · Answer 1 · 2019-06-04T10:09:30.260

0

OK, just join once, as below:

select
    t1.pk,
    t1.attr,
    t1.val
from
    tbl t1
join
    tbl2 t2 on t1.pk = t2.rel
group by
    t1.pk,
    t1.attr,
    t1.val
having(count(1)>=2) order by t1.pk;
 pk | attr | val 
----+------+-----
  0 | ohif |   4
  1 | foha |  56
  2 | slns |   2
(3 rows)

Or just join once and use CTE(with clause), as below:

with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
 pk | attr | val 
----+------+-----
  0 | ohif |   4
  1 | foha |  56
  2 | slns |   2
(3 rows)

Is the SQL clearer?

edited Jun 04 '19 at 10:09

answered Jun 04 '19 at 10:05

Shawn.X

1,323
6
15

Isn't your SQL functionally the same as mine? The only difference is that I don't explicitly use a temp table. – rrrrrrrrrrrrrrrr Jun 04 '19 at 10:08
@RiccardoBestetti reEdited the answer, check please. Just see the first SQL. :) – Shawn.X Jun 04 '19 at 10:09
Ok, makes sense now! Thank you. – rrrrrrrrrrrrrrrr Jun 04 '19 at 10:14

Find rows in relation with at least n rows in a different table without joins

1 Answers1