PostgreSQL: SELECT count of rows that are not DISTINCT

Question

I'm using PostgreSQL 9.3, and I've got this big, ugly query...

SELECT cai.id
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-11'
    AND (   key = 'disable_student_nav' AND value = 'True'
         OR key = 'pacing' AND value = 'student');

...which gives me this result...

How can I improve my query to get the count of the duplicate rows (2 in this example)?

@wingedpanther: Good suggestion. That gives me the two duplicate IDs, but not the count. The number of rows that have two duplicate IDs could be in the thousands, so I don't want to return all that data from my server and have to count it on the client side. — Rob Johansen, Sep 12 '15 at 04:55

score 4 · Accepted Answer · edited May 23 '17 at 12:14

Using Sub-Query

select count(*) total_dups from(
    SELECT count(cai.id)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-11'
        AND (key = 'disable_student_nav'
                AND value = 'True'
                OR key = 'pacing'
                AND value = 'student')
    group by cai.id having count(cai.id) >1
    ) t

group by cai.id having count(cai.id) > 1 can be used to find out duplicates count of each cai.id,Then SELECT count(cai.id)(select ...)t can be used to find out count of all duplicate in the Sub-Query.

OR

Using CTE

with cte as (
SELECT count(cai.id)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-11'
        AND (key = 'disable_student_nav'
                AND value = 'True'
                OR key = 'pacing'
                AND value = 'student')
    group by cai.id having count(cai.id) >1
    )

    select count(*) from  cte

Difference between CTE and SubQuery?

score 0 · Answer 2 · answered Sep 12 '15 at 11:27

Because of the structure of the query, I suspect that duplicates might only arise from the or part of the query. If you are limited to at most two duplicates, you can do the calculation without a subquery:

SELECT count(cai.id) - count(distinct cai.id)
FROM common_activityinstance cai JOIN
     common_activityinstance_settings cais
     ON cai.id = cais.activityinstance_id JOIN
     common_activitysetting cas
     ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-11' AND
      (key, value) IN (('disable_student_nav', 'True'), ('pacing', 'student'));

Note: This only works in the special case that each id appears only once or twice.

PostgreSQL: SELECT count of rows that are not DISTINCT

2 Answers2