I am using SQL Server to store tens of millions of records. I need to be able to query its tables to find missing rows where there are gaps in the Id column, as there should be none.
I am currently using a solution that I have found here on StackOverflow:
CREATE PROCEDURE [dbo].[find_missing_ids]
@Table NVARCHAR(128)
AS
BEGIN
DECLARE @query NVARCHAR(MAX)
SET @query = 'WITH Missing (missnum, maxid) '
+ N'AS '
+ N'('
+ N' SELECT 1 AS missnum, (select max(Id) from ' + @Table + ') '
+ N' UNION ALL '
+ N' SELECT missnum + 1, maxid FROM Missing '
+ N' WHERE missnum < maxid '
+ N') '
+ N'SELECT missnum '
+ N'FROM Missing '
+ N'LEFT OUTER JOIN ' + @Table + ' tt on tt.Id = Missing.missnum '
+ N'WHERE tt.Id is NULL '
+ N'OPTION (MAXRECURSION 0);';
EXEC sp_executesql @query
END;
This solution has been working very well, but it has been getting slower and more resource intensive as the tables have grown. Now, running the procedure on a table of 38 million rows is taking about 3.5 minutes and lots of CPU.
Is there a more efficient way to perform this? After a certain range has been found to not contain any missing Ids, I no longer need to check that range again.