-1

I have a table as such (tbl):

+----+------+-----+
| pk | attr | val |
+----+------+-----+
|  0 | ohif |   4 |
|  1 | foha |  56 |
|  2 | slns |   2 |
|  3 | faso |  11 |
+----+------+-----+

And another table in n-to-1 relationship with tbl (tbl2):

+----+-----+
| pk | rel |
+----+-----+
|  0 |   0 |
|  1 |   1 |
|  2 |   0 |
|  3 |   2 |
|  4 |   2 |
|  5 |   3 |
|  6 |   1 |
|  7 |   2 |
+----+-----+

(tbl2.rel -> tbl.pk.)

I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.

I.e., for n = 2, I want this table:

+----+------+-----+
| pk | attr | val |
+----+------+-----+
|  0 | ohif |   4 |
|  1 | foha |  56 |
|  2 | slns |   2 |
+----+------+-----+

This is the solution I came up with:

SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
    SELECT tbl.pk
    FROM tbl
    RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
    GROUP BY tbl.pk
    HAVING COUNT(tbl2.*) >= 2  -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk

Can it be done without selecting the candidates with a subquery and re-joining the table with itself?

I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.

rrrrrrrrrrrrrrrr
  • 344
  • 5
  • 16
  • @philipxy I find it ironic that in asking me to be particular you are being so obviously generic: the problem is stated precisely, in a general form, and accompanied by a fully working solution. I though, apparently mistakenly, that specifically asking for a solution not involving `join`s, would suffice to show that I _did_ research the problem before posting the question, and that this is not a "please do the query for me" faq but an optimization question. Which is also the reason why the question you linked is not a duplicate. – rrrrrrrrrrrrrrrr Jun 05 '19 at 09:02
  • Since the question has been marked as possible duplicate, I edited it. I hope it's clearer now. – rrrrrrrrrrrrrrrr Jun 05 '19 at 09:08
  • I was biased by your (original) title. But there are a pile of duplicates for your closing question & anyway answers for that question are answers for that title--whatever you researched, there are still a lot of easily found duplicates. PS For other ways to write subqueries via joins & joins via subqueries, there are more faqs. – philipxy Jun 05 '19 at 09:25
  • @philipxy You have a point on the title, which I have corrected. I still cannot find a duplicate to my question. – rrrrrrrrrrrrrrrr Jun 05 '19 at 09:28
  • The duplicate link I gave was to the wrong post. Now googling (say) 'site:stackoverflow.com sql Find rows with at least n rows in another table' ... first hit: – philipxy Jun 05 '19 at 09:52
  • Possible duplicate of [SQL: Select rows with a column value that occurs at least N times?](https://stackoverflow.com/questions/5760335/sql-select-rows-with-a-column-value-that-occurs-at-least-n-times) – philipxy Jun 05 '19 at 09:54
  • You might be interested in [this re "self-join" & relational querying](https://stackoverflow.com/a/37384306/3404097). All queries like yours for fixed n can be phrased by the 1st table joined to an n-ary self-join of the 2nd table because you want rows that are in the table & for which there exist n rows matching in the 2nd. So you don't need to self-join the 1st. Instead of the join + self joins you can join the 2 & group & count matches. PS SQL WHERE-IN-subquery is just a way to phrase a join--it doesn't meaningfully avoid joining. PS [2nd hit](https://stackoverflow.com/q/47309139/3404097) – philipxy Jun 05 '19 at 10:04

1 Answers1

0

OK, just join once, as below:

select
    t1.pk,
    t1.attr,
    t1.val
from
    tbl t1
join
    tbl2 t2 on t1.pk = t2.rel
group by
    t1.pk,
    t1.attr,
    t1.val
having(count(1)>=2) order by t1.pk;
 pk | attr | val 
----+------+-----
  0 | ohif |   4
  1 | foha |  56
  2 | slns |   2
(3 rows)

Or just join once and use CTE(with clause), as below:

with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
 pk | attr | val 
----+------+-----
  0 | ohif |   4
  1 | foha |  56
  2 | slns |   2
(3 rows)

Is the SQL clearer?

Shawn.X
  • 1,323
  • 6
  • 15