SQL Server : efficient way to find missing Ids

Question

I am using SQL Server to store tens of millions of records. I need to be able to query its tables to find missing rows where there are gaps in the Id column, as there should be none.

I am currently using a solution that I have found here on StackOverflow:

CREATE PROCEDURE [dbo].[find_missing_ids]
    @Table NVARCHAR(128)
AS
BEGIN
    DECLARE @query NVARCHAR(MAX)
    SET @query = 'WITH Missing (missnum, maxid) '
+ N'AS '
+ N'('
+ N' SELECT 1 AS missnum, (select max(Id) from ' + @Table + ') '
+ N'    UNION ALL '
+ N'    SELECT missnum + 1, maxid FROM Missing '
+ N'    WHERE missnum < maxid '
+ N') '
+ N'SELECT missnum '
+ N'FROM Missing '
+ N'LEFT OUTER JOIN ' + @Table + ' tt on tt.Id = Missing.missnum '
+ N'WHERE tt.Id is NULL '
+ N'OPTION (MAXRECURSION 0);';

    EXEC sp_executesql @query
END;

This solution has been working very well, but it has been getting slower and more resource intensive as the tables have grown. Now, running the procedure on a table of 38 million rows is taking about 3.5 minutes and lots of CPU.

Is there a more efficient way to perform this? After a certain range has been found to not contain any missing Ids, I no longer need to check that range again.

I would generate tally table using different method than recursive cte like here: https://stackoverflow.com/a/1394239/5070879. Second thing ID could have gaps both for IDENTITY/SEQUENCE — Lukasz Szozda, Mar 09 '19 at 16:23
Look at this [answer](https://stackoverflow.com/questions/1312101/how-do-i-find-a-gap-in-running-counter-with-sql) for finding a gap in a running counter. — Peter Smith, Mar 09 '19 at 16:25
I suggest making use of `QUOTENAME`, what you have there is very open to injection — Thom A, Mar 09 '19 at 16:26
The real question is why do you care about gaps in the id columns. — Zohar Peled, Mar 10 '19 at 06:37

David Dubois · Accepted Answer · 2019-03-17T15:19:56.643

JBJ's answer is almost complete. The query needs to return the From and Through for each range of missing values.

select B+1 as [From],A-1 as[Through]from
(select StuffID as A, 
lag(StuffID)over(order by StuffID)as B from Stuff)z
where A<>B+1
order by A

I created a test table with 50 million records, then deleted a few. The first row of the result is:

From   Through
33     35

This indicates that all IDs in the range from 33 through 35 are missing, i.e. 33, 34 and 35.

On my machine the query took 37 seconds.

score 1 · Answer 2 · answered Mar 09 '19 at 18:38

1

try

select pId 
from (select Id, lag(Id) over (order by Id) pId from yourschema.yourtable) e
where pId <> (Id-1)
order by Id

replacing yourschema.yourtable with the appropriate table information

answered Mar 09 '19 at 18:38

JBJ

393
3
8

How will that work for gaps of more than a single value, e.g. `1, 2, 5, 13`? – HABO Mar 09 '19 at 18:47
next option is to loop through each id and if not found put it in a table but adding the new records in with the new id's would require set identity_insert off/on, not the best way. there are 2 possible datatypes you could use for a new Id column, bigint starting low (identity(-9223372036854775808,1) not null primary key) or a uniqueidentifier default(newid()) not null primary key. Both would require dropping the PK index before adding to the table and would both wreak havoc to all tables relying on the old key. – JBJ Mar 09 '19 at 23:15

PSK · Answer 3 · 2019-03-10T04:51:34.493

Try this solution, it will be faster than CTE.

;WITH CTE AS
(
SELECT ROW_NUMBER() 
         OVER ( 
           ORDER BY (SELECT NULL)) RN 
FROM   ( values  (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v(id) --10 ROWS 
       CROSS JOIN ( values  (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v1(id)--100 ROWS 
       CROSS JOIN ( values  (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v2(id) --1000 ROWS 
       CROSS JOIN ( values  (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v3(id) --10000 ROWS 
       CROSS JOIN ( values  (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v4(id)--100000 ROWS 
       CROSS JOIN ( values  (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v5(id)--1000000 ROWS 
)

SELECT RN AS Missing 
FROM CTE C
LEFT JOIN YOURABLE T ON T.ID=R.ID
WHERE T.ID IS NULL

If you want you can use master..[spt_values] also to generate the number like following.

 SELECT (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) RN 
        FROM   master..[spt_values] T1
        CROSS JOIN (select top 500 * from master..[spt_values]) T2

Above query will generate 1268500 numbers

Note: You need to add the CROSS JOIN as per your requirement.

SQL Server : efficient way to find missing Ids

3 Answers3

Linked