SQL: How to select persons with at least N comments?

Question

I have a simple (I suppose) problem, which is baffling me (I'm not a SQL expert as you will see soon... :-).

I have a table person with fields id and name, and a table comment with field id, id_person and text:

---------------
 table: person
---------------
 id | name
---------------

---------------------------
 table: comment
---------------------------
 id | id_person | text
---------------------------

The comments are linked to persons on person.id => comment.id_person.
Each person can have many comments.
I would prefer no to save the number of comments in any table.

The question is: How to select all persons with at least N comments?

This is what I am trying right now, but it's of course wrong...:

SELECT * FROM person WHERE (SELECT COUNT(*) FROM comment WHERE id_person = 2) >= N

P.S.: I'm currently working with sqlite, but a standard SQL answer is fine...

"This is what I am trying right now, but it's of course wrong," what makes you say it's wrong? — Matthew Haugen, Mar 11 '15 at 20:09
'cause I had to se a specific 'id_person', but I need ALL persons... :-( As you see I'm quite confused, with SQL... :-( — MarcoS, Mar 11 '15 at 20:11
No worries. I included a version in my answer that should fix that issue with yours. Other than that, it was a really good start, and likely what I would have done. — Matthew Haugen, Mar 11 '15 at 20:12

score 4 · Accepted Answer · edited May 23 '17 at 12:12

I'm not sure I see what's wrong with what you've tried. You hardcode the ID, but if you just swap that out, it should be fine.

SELECT *
FROM person
WHERE (SELECT COUNT(*)
       FROM comment
       WHERE id_person = person.id) >= N

Another solution, whether better or not is hard to know (you should try each if performance is a concern, or at least check the query plans), would be to utilize a GROUP BY clause:

SELECT *
FROM person
WHERE id IN (SELECT id_person
             FROM comment
             GROUP BY id_person
             HAVING COUNT(*) >= N)

I like using the IN clause here, as compared to an INNER JOIN, because it lets you do SELECT *, as you had in your example. Doing that can, of course, sometimes not be ideal, but it sounds like it's appropriate for your case.

In each of these, of course, you'll have to swap out N with either a parameter or literal value.

Lieven Keersmaekers · Answer 2 · 2015-03-11T20:15:45.003

3

Simplest would be to use a group by and a having clause like this

select p.name 
from   person p
       inner join comment c on c.id_person = p.id
group by
       p.name
having count(*) = 2

As a sidenote: I would rename your columns to

---------------------------
table: person
---------------
 id_person | name
---------------

---------------------------
 table: comment
---------------------------
 id_comment | id_person | text
---------------------------

as to make it clear what columns are related to each other. Someday you will encounter a database schema that's new to you where the names don't match and you have to resort to looking up foreign key relations to work things out. Trust me, it's not fun.

edited Mar 11 '15 at 20:15

answered Mar 11 '15 at 20:10

Lieven Keersmaekers

57,207
13
112
146

1

Instead of `=`, I think the OP is wanting `>=`. Also, I personally have no issues with id columns, as long as the foreign keys are specific enough (as they appear to already be in this case). It's really a preference from what I've seen where recommending one over the other can be debated -- here's a good post: http://programmers.stackexchange.com/questions/114728/why-is-naming-a-tables-primary-key-column-id-considered-bad-practice – sgeddes Mar 11 '15 at 20:28
@sgeddes - good post, thanks. For me, [this answer](http://programmers.stackexchange.com/a/114730/4436) really sums it up and shows the difference. It doesn't matter much if you designed and implemented the tables, you'll know what is what but it makes a whole lot of difference for those having to maintain the system. I've literaly wasted hours having to maintain large legacy systems where there's no clear relation between tables by looking at the column names *(not even FK for that matter)* – Lieven Keersmaekers Mar 11 '15 at 20:39
1

:) -- You linked to the answer in the post I provided -- the debate continues (look at the comments to the answer)! A well designed data model with descriptive foreign keys generally resolves the issue in my opinion, but I do hear what you're saying. As a developer though, I prefer having commonality between primary keys as its is easier to structure the code (reusing base classes for example). Fun to debate! – sgeddes Mar 11 '15 at 20:48
@sgeddes - I know, that was intentional :). I have been developing most of my professional career and was lucky enough to be able to create my own data models *(and only having to clean up my own mess)*. The last three years though, I have taken on a support position where most of the time I have to figure out other people's code. Since then, consistent naming *(or the lack of it)* has become a bit of a peeve for me. I would settle for `id` columns though *(but I prefer full names :))* – Lieven Keersmaekers Mar 11 '15 at 20:58
Thanks for your comments, and for your interesting debate! And yes, I want `>=`... – MarcoS Mar 11 '15 at 22:12

AXMIM · Answer 3 · 2015-03-11T20:18:09.963

0

DECLARE @N  INT--Here I declared variable N, but you could simply hardcode the value instead.
SELECT @N = 2

select  person.id,
        person.name 
from person 
INNER JOIN comment ON person.id = comment.id_person
GROUP BY person.id,
         person.name 
HAVING COUNT(comment.id) >= @N

edited Mar 11 '15 at 20:18

answered Mar 11 '15 at 20:10

AXMIM

2,424
1
20
38

SQL: How to select persons with at least N comments?

3 Answers3