6

I have the following data in a SQL Table:

enter image description here

I need to query the data so I can get a list of missing "familyid" per employee.

For example, I should get for Employee 1021 that is missing in the sequence the IDs: 2 and 5 and for Employee 1027 should get the missing numbers 1 and 6.

Any clue on how to query that?

Appreciate any help.

VAAA
  • 14,531
  • 28
  • 130
  • 253

5 Answers5

3

Find the first missing value

I would use the ROW_NUMBER window function to assign the "correct" sequence ID number. Assuming that the sequence ID restarts every time the employee ID changes:

SELECT
  e.id,
  e.name,
  e.employee_number,
  e.relation,
  e.familyid,
  ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
FROM employee_members e

Then, I would filter the result set to only include the rows with mismatching sequence IDs:

SELECT *
FROM (
  SELECT
    e.id,
    e.name,
    e.employee_number,
    e.relation,
    e.familyid,
    ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
  FROM employee_members e
) a
WHERE a.familyid <> a.sequenceid

Then again, you should easily group by employee_number and find the first missing sequence ID for each employee:

SELECT
  a.employee_number,
  MIN(a.sequence_id) AS first_missing
FROM (
  SELECT
    e.id,
    e.name,
    e.employee_number,
    e.relation,
    e.familyid,
    ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
  FROM employee_members e
) a
WHERE a.familyid <> a.sequenceid
GROUP BY a.employee_number

Finding all the missing values

Extending the previous query, we can detect a missing value every time the difference between familyid and sequenceid changes:

-- Warning: this is totally untested :-/
SELECT
  b.employee_number,
  MIN(b.sequence_id) AS missing
FROM (
  SELECT
    a.*,
    a.familyid - a.sequenceid AS displacement
    SELECT
      e.*,
      ROW_NUMBER() OVER(PARTITION BY e.employeeid ORDER BY familyid) - 1 AS sequenceid
    FROM employee_members e
  ) a
) b
WHERE b.displacement <> 0
GROUP BY
  b.employee_number,
  b.displacement
Danilo Piazzalunga
  • 7,590
  • 5
  • 49
  • 75
  • This is what I was going to go for... please note it doesn't work if there's more than one missing in a row. – Ben Jul 06 '13 at 15:58
  • Ok, so this will show the first missing, is not possible to get all the missings by employee ? Thanks a lot – VAAA Jul 06 '13 at 16:00
  • 1
    The best solution is to use CTE (Common Table Expressions), as in Gordon Linoff's answser. See also: http://stackoverflow.com/a/5279088/238421 – Danilo Piazzalunga Jul 06 '13 at 16:09
  • @DaniloPiazzalunga - Why is that better than my solution -- my solution is faster. – Hogan Jul 06 '13 at 16:15
  • @Hogan: Sorry, that was a simple matter of preference. I am just addicted to sequence-generating recursive queries. – Danilo Piazzalunga Jul 06 '13 at 16:34
  • @DaniloPiazzalunga is working but with a small bug, the first missing number is detected right but the second one is the missing - 1, so for example if missings are: 4 and 8 your last query return 4 and 7. – VAAA Jul 06 '13 at 17:23
3

Here is one approach. Calculate the maximum family id for each employee. Then join this to a list of numbers up to the maximum family id. The result has one row for each employee and expected family id.

Do a left outer join from this back to the original data, on the familyid and the number. Where nothing matches, those are the missing values:

with nums as (
      select 1 as n
      union all
      select n+1
      from nums
      where n < 20
     )
select en.employee, n.n as MissingFamilyId
from (select employee, min(familyid) as minfi, max(familyid) as maxfi
      from t
      group by employee
     ) en join
     nums n
     on n.n <= maxfi left outer join
     t
     on t.employee = en.employee and
        t.familyid = n.n
where t.employee_number is null;

Note that this will not work when the missing familyid is that last number in the sequence. But it might be the best that you can do with your data structure.

Also the above query assumes that there are at most 20 family members.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

This will work, you select all "Dependents" and left join on the prior row. If that row is not there then you show the result:

SELECT 'Missing Prior', t1.*
FROM employee_members t1
LEFT JOIN employee_members t2 ON t1.employee_number = t2.employee_number 
                  AND (t1.familyid-1) = t2.familyid 
WHERE t2.employee_number is null and t1.relation == 'Dependent'

Another version that shows you the missing number:

SELECT t1.employee_number, t1.familyid-1 as Missing_Member
FROM employee_members t1
LEFT JOIN employee_members t2 ON t1.employee_number = t2.employee_number 
                  AND (t1.familyid-1) = t2.familyid 
WHERE t2.employee_number is null and t1.relation == 'Dependent'
Hogan
  • 69,564
  • 10
  • 76
  • 117
2

Another solution: Build up a table with all possible values from the sequence (can play with identity for this). Then left join on the table where source table is null.

DECLARE @Seq TABLE (id INT IDENTITY(1, 1))
DECLARE @iter INT = 1

WHILE @iter <= (
        SELECT MAX([your ID column])
        FROM [Offending Table]
        )
BEGIN
    INSERT @Seq DEFAULT
    VALUES

    SET @iter = @iter + 1
END

SELECT id
FROM @seq s
LEFT JOIN [Offending Table] ot ON s.id = ot.[your ID column]
WHERE ot.[your ID column] IS NULL
apeman
  • 390
  • 5
  • 18
Drakey
  • 66
  • 2
0

This select will retrieve list of missing "familyid" per employee using CTE approach.

QUERY :

   WITH emp_grp (
        EmployeeID
        ,MaxFamilyID
        )
    AS (
        SELECT e2.EmployeeID
            ,MAX(e2.FamilyID) MaxFamilyID
        FROM employee_number e2
        GROUP BY e2.EmployeeID
        )
        ,emp_mem
    AS (
        SELECT EmployeeID
            ,0 AS FamilyID
            ,MaxFamilyID
        FROM emp_grp

        UNION ALL

        SELECT EmployeeID
            ,FamilyID + 1 AS FamilyID
            ,MaxFamilyID
        FROM emp_mem
        WHERE emp_mem.FamilyID < MaxFamilyID
        )

    SELECT emp_mem.EmployeeID
        ,emp_mem.FamilyID
    FROM emp_mem
    LEFT JOIN employee_number emp_num ON emp_mem.EmployeeID = emp_num.EmployeeID
        AND emp_mem.FamilyID = emp_num.FamilyID
    WHERE emp_num.EmployeeID IS NULL
    ORDER BY emp_mem.EmployeeID
        ,emp_mem.FamilyID

OPTION ( MAXRECURSION 32767)

OUTPUT :

EmployeeID  FamilyID
----------- -----------
1021        2
1021        5
1027        1
1027        6
S.M
  • 776
  • 3
  • 8
  • Hi, why do you have 3 tables: employee_number, emp_grp and emp_mem? I just have 1 table. – VAAA Jul 06 '13 at 21:37
  • Im getting this error: Msg 530, Level 16, State 1, Line 1 The statement terminated. The maximum recursion 100 has been exhausted before statement completion. – VAAA Jul 06 '13 at 21:40
  • To increase the recursion, add this line OPTION ( MAXRECURSION 200) at end of the statement. but show your query, lets see what's wrong. – S.M Jul 06 '13 at 21:56
  • what is Maximum value of FamilyID in this table? – S.M Jul 06 '13 at 22:00
  • The maximum value of FamilyID is 15 – VAAA Jul 06 '13 at 22:56
  • change Dependent field with familyid in your query – S.M Jul 06 '13 at 23:31
  • I suspect that familyid is not in employee_table, if its in different table, join that table with employee_table to get familyid field. Hope this helps. – S.M Jul 06 '13 at 23:44
  • actually Dependant is same as FamilyId – VAAA Jul 07 '13 at 00:01
  • add this OPTION ( MAXRECURSION 32767) at the end of the query, this is max recursion allowed. this is not efficient way of doing this – S.M Jul 07 '13 at 00:14