how to find duplicates and gaps in this scenario in mysql

Question

Hi I have a table that looks like

-----------------------------------------------------------
|  id  |  group_id | source_id | target_id | sortsequence |
-----------------------------------------------------------
|  2   |    1      |    2      |   4       |     1        |   
-----------------------------------------------------------
|  4   |    1      |    20     |   2       |     1        |   
-----------------------------------------------------------
|  5   |    1      |    2      |   14      |     1        |   
-----------------------------------------------------------
|  7   |    1      |    2      |   7       |     3        |   
-----------------------------------------------------------
|  20  |    2      |    20     |   4       |     3        |   
-----------------------------------------------------------
|  21  |    2      |    20     |   4       |     1        |   
-----------------------------------------------------------

Scenario

There are two scenarios that needs to be handled.

Sortsequence column value should be unique against one source_id and group_id. For example if all the records having group_id = 1 AND source_id = 2 should have sortsequence unique. In above example records having id= and 5 which are having group_id = 1 and source_id = 2 have same sortsequence which is 1. This is faulty record. I need to find out these records.
If group_id and source_id is same. The sortsequence columns value should be continous. There should be no gap. For example in above table records having id = 20, 21 having same group_id and source_id and sortsequence value is 3 and 1. Even this is unique but there is a gap in sortsequence value. I need to also find out these records.

MY So Far Effort

I have written a query

SELECT source_id,`group_id`,GROUP_CONCAT(id) AS children 
FROM
    table 
GROUP BY source_id,
  sortsequence,
  `group_id` 
 HAVING COUNT(*) > 1

This query only address the scenario 1. How to handle scenario 2? Is there any way to do it in same query or I have to write other to handle second scenario.

By the way query will be dealing with million of records in table so performance must be very good.

Is the sort sequence always going to start with 1? If so, you could just get count and sum of sortsequence for (source_id, group_id) and apply the formula for sum of n numbers — Chetter Hummin, Mar 26 '13 at 07:13
Then I'd sum of the values and see if it equals count*(count+1)/2. This may need an inner query though. Oracle would probably have some analytical functions to do this for you. But I don't know the equivalent in mysql — Chetter Hummin, Mar 26 '13 at 07:18
possibly related to http://stackoverflow.com/questions/8014577/group-by-and-aggregate-sequential-numeric-values. — didierc, Mar 26 '13 at 07:38
@Chetter Hummin Grouping only by group_id and source_id, and then checking for groups where `COUNT(DISTINCT sortsequence) <> COUNT(sortsequence) OR COUNT(sortsequence) <> MAX(sortsequence)` should detect both problems at once. — Terje D., Mar 26 '13 at 08:19

score 1 · Accepted Answer · answered Mar 27 '13 at 09:57

Got answer from Tere J Comments. Following query covers above mentioned both criteria.

 SELECT 
     source_id, `group_id`, GROUP_CONCAT(id) AS faultyIDS    
 FROM
     table
 GROUP BY
     source_id,group_id 
 HAVING
     COUNT(DISTINCT sortsequence) <> COUNT(sortsequence) OR COUNT(sortsequence) <> MAX(sortsequence) OR MIN(sortsequence) <> 1

May be it can help others.

score 0 · Answer 2 · answered Mar 26 '13 at 08:01

Try this query it will solve both of the cases as you have mentioned in the question.

SELECT 
   a.* 
FROM 
   tbl a
INNER JOIN 
   (select 
       @rn:=IF(@prevG = group_id AND @prevS = source_id, @rn + 1, 1) As rId,
       @prevG:=group_id AS group_id, 
       @prevS:=source_id AS source_id, 
       id, 
       sortsequence
    FROM 
       tbl 
    join 
       (select @rn:=0, @prevS:=0, @prevG:=0)b
    order by group_id, source_id, id) b
ON a.id = b.id AND a.SORTSEQUENCE <> b.RID;

FIDDLE

how to find duplicates and gaps in this scenario in mysql

2 Answers2