7

Hi I have a table that looks like

-----------------------------------------------------------
|  id  |  group_id | source_id | target_id | sortsequence |
-----------------------------------------------------------
|  2   |    1      |    2      |   4       |     1        |   
-----------------------------------------------------------
|  4   |    1      |    20     |   2       |     1        |   
-----------------------------------------------------------
|  5   |    1      |    2      |   14      |     1        |   
-----------------------------------------------------------
|  7   |    1      |    2      |   7       |     3        |   
-----------------------------------------------------------
|  20  |    2      |    20     |   4       |     3        |   
-----------------------------------------------------------
|  21  |    2      |    20     |   4       |     1        |   
-----------------------------------------------------------

Scenario

There are two scenarios that needs to be handled.

  1. Sortsequence column value should be unique against one source_id and group_id. For example if all the records having group_id = 1 AND source_id = 2 should have sortsequence unique. In above example records having id= and 5 which are having group_id = 1 and source_id = 2 have same sortsequence which is 1. This is faulty record. I need to find out these records.
  2. If group_id and source_id is same. The sortsequence columns value should be continous. There should be no gap. For example in above table records having id = 20, 21 having same group_id and source_id and sortsequence value is 3 and 1. Even this is unique but there is a gap in sortsequence value. I need to also find out these records.

MY So Far Effort

I have written a query

SELECT source_id,`group_id`,GROUP_CONCAT(id) AS children 
FROM
    table 
GROUP BY source_id,
  sortsequence,
  `group_id` 
 HAVING COUNT(*) > 1 

This query only address the scenario 1. How to handle scenario 2? Is there any way to do it in same query or I have to write other to handle second scenario.

By the way query will be dealing with million of records in table so performance must be very good.

Awais Qarni
  • 17,492
  • 24
  • 75
  • 137
  • Is the sort sequence always going to start with 1? If so, you could just get count and sum of sortsequence for (source_id, group_id) and apply the formula for sum of n numbers – Chetter Hummin Mar 26 '13 at 07:13
  • @ChetterHummin Yes it must start from 1 – Awais Qarni Mar 26 '13 at 07:15
  • Then I'd sum of the values and see if it equals count*(count+1)/2. This may need an inner query though. Oracle would probably have some analytical functions to do this for you. But I don't know the equivalent in mysql – Chetter Hummin Mar 26 '13 at 07:18
  • possibly related to http://stackoverflow.com/questions/8014577/group-by-and-aggregate-sequential-numeric-values. – didierc Mar 26 '13 at 07:38
  • @Chetter Hummin Grouping only by group_id and source_id, and then checking for groups where `COUNT(DISTINCT sortsequence) <> COUNT(sortsequence) OR COUNT(sortsequence) <> MAX(sortsequence)` should detect both problems at once. – Terje D. Mar 26 '13 at 08:19
  • @TerjeD. Yup. Definitely seems to be a simpler solution – Chetter Hummin Mar 26 '13 at 08:23

2 Answers2

1

Got answer from Tere J Comments. Following query covers above mentioned both criteria.

 SELECT 
     source_id, `group_id`, GROUP_CONCAT(id) AS faultyIDS    
 FROM
     table
 GROUP BY
     source_id,group_id 
 HAVING
     COUNT(DISTINCT sortsequence) <> COUNT(sortsequence) OR COUNT(sortsequence) <> MAX(sortsequence) OR MIN(sortsequence) <> 1

May be it can help others.

Awais Qarni
  • 17,492
  • 24
  • 75
  • 137
0

Try this query it will solve both of the cases as you have mentioned in the question.

SELECT 
   a.* 
FROM 
   tbl a
INNER JOIN 
   (select 
       @rn:=IF(@prevG = group_id AND @prevS = source_id, @rn + 1, 1) As rId,
       @prevG:=group_id AS group_id, 
       @prevS:=source_id AS source_id, 
       id, 
       sortsequence
    FROM 
       tbl 
    join 
       (select @rn:=0, @prevS:=0, @prevG:=0)b
    order by group_id, source_id, id) b
ON a.id = b.id AND a.SORTSEQUENCE <> b.RID;

FIDDLE

Meherzad
  • 8,433
  • 1
  • 30
  • 40