0

I'm using PostgreSQL 9.3, and I've got this big, ugly query...

SELECT cai.id
FROM common_activityinstance cai
JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
JOIN common_activitysetting cas ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-11'
    AND (   key = 'disable_student_nav' AND value = 'True'
         OR key = 'pacing' AND value = 'student');

...which gives me this result...

    id  
  ------
   1352
   1352
   1353
   1353
   1354
   1355
 (6 rows)

How can I improve my query to get the count of the duplicate rows (2 in this example)?

Vivek S.
  • 19,945
  • 7
  • 68
  • 85
Rob Johansen
  • 5,076
  • 10
  • 40
  • 72

2 Answers2

4

Using Sub-Query

select count(*) total_dups from(
    SELECT count(cai.id)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-11'
        AND (key = 'disable_student_nav'
                AND value = 'True'
                OR key = 'pacing'
                AND value = 'student')
    group by cai.id having count(cai.id) >1
    ) t

group by cai.id having count(cai.id) > 1 can be used to find out duplicates count of each cai.id,Then SELECT count(cai.id)(select ...)t can be used to find out count of all duplicate in the Sub-Query.

OR

Using CTE

with cte as (
SELECT count(cai.id)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-11'
        AND (key = 'disable_student_nav'
                AND value = 'True'
                OR key = 'pacing'
                AND value = 'student')
    group by cai.id having count(cai.id) >1
    )

    select count(*) from  cte

Difference between CTE and SubQuery?

Community
  • 1
  • 1
Vivek S.
  • 19,945
  • 7
  • 68
  • 85
0

Because of the structure of the query, I suspect that duplicates might only arise from the or part of the query. If you are limited to at most two duplicates, you can do the calculation without a subquery:

SELECT count(cai.id) - count(distinct cai.id)
FROM common_activityinstance cai JOIN
     common_activityinstance_settings cais
     ON cai.id = cais.activityinstance_id JOIN
     common_activitysetting cas
     ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-11' AND
      (key, value) IN (('disable_student_nav', 'True'), ('pacing', 'student'));

Note: This only works in the special case that each id appears only once or twice.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786