Find duplicate rows in PostgreSQL with additional criteria

Question

I have a table called entries that has the following columns: case_id, number and filed_on.

If I were only looking for duplicates where the case_id and number were the same, I would use the following query:

SELECT case_id, number, count(*) FROM entries GROUP BY case_id, number HAVING count(*) > 1;

But I would like to filter by an additional criterion, namely, that at least 1 of the duplicate rows has a filed_on that is null.

I thought the following query would work, but I think it gives me duplicate rows where ALL the duplicates have filed_on set to null, instead of duplicate rows where 1 or more of the rows have filed_on of null:

SELECT case_id, number, count(*) FROM entries WHERE filed_on IS NULL GROUP BY case_id, number HAVING count(*) > 1;

Any ideas for how I can modify this query to get what I want?

score 2 · Accepted Answer · edited May 23 '17 at 12:09

You want a condition which is checked after grouping, not before, i.e. HAVING instead of WHERE. Note that the condition should either be one of grouping fields or aggregate (just like in SELECT). You should be able to count number of rows which satisfies the condition like in this answer:

SELECT case_id, number, count(*)
FROM entries
GROUP BY case_id, number
HAVING (count(*) > 1) AND (count(CASE WHEN filed_on IS NULL THEN 1 END) >= 1)

See SQL Fiddle

Find duplicate rows in PostgreSQL with additional criteria

1 Answers1