-2

I would like to get all missing numbers in a sequence of numbers.
Just wondering if there is a better approach than below?

SELECT x
  FROM 
  (
    SELECT x,
           LAG(x,1) OVER ( ORDER BY x ) prev_x
      FROM 
       ( SELECT * FROM
        ( SELECT 1 AS x ),
        ( SELECT 2 AS x ),
        ( SELECT 3 AS x ),
        ( SELECT 4 AS x ),
        ( SELECT 5 AS x ),
        ( SELECT 6 AS x ),
        ( SELECT 8 AS x ),
        ( SELECT 10 AS x ),
        ( SELECT 11 AS x )
       )
  ) 
 WHERE x-prev_x > 1;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Teja
  • 13,214
  • 36
  • 93
  • 155

3 Answers3

3

Let me be honest with you!
Any another working solution will be better then presented in question - for one simple reason - it is wrong! it does not return missing numbers at all! it rather shows number after next gap. that's all (hope you will appreciate that I opened your eyes on this)

Now, about better solution(s) - there are tons of options for you to pursue.
Note: below options for BigQuery ONLY!

Option 1

BigQuery Standard SQL - see How to Enable Standard SQL

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
),
nums AS (
  SELECT num 
  FROM UNNEST(GENERATE_ARRAY((SELECT MIN(x) FROM YourTable), (SELECT MAX(x) FROM YourTable))) AS num
)
SELECT num FROM nums
LEFT JOIN YourTable ON num = x
WHERE x IS NULL
ORDER BY num

Option 2

BigQuery Legacy SQL you can try below (here you need to set start/min and end/max values inside select expression for nums table

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 11, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 11
) AS nums
LEFT JOIN (
  SELECT x FROM
    (SELECT 1 AS x),
    (SELECT 2 AS x),
    (SELECT 3 AS x),
    (SELECT 6 AS x),
    (SELECT 8 AS x),
    (SELECT 10 AS x),
    (SELECT 11 AS x)
) AS YourTable
ON num = x
WHERE x IS NULL

Option 3

BigQuery Legacy SQL - if you want not to have dependency on min and max and have needs in setting those values - you can use below solution - it only requires to set high enough max to accommodate your expected growth (I put 1000 for example)

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 1000, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 1000
) AS nums
LEFT JOIN YourTable
ON num = x
WHERE x IS NULL
AND num BETWEEN (SELECT MIN(x) FROM YourTable) AND (SELECT MAX(x) FROM YourTable) 

Option 4 (for some reason - my favorite so far)

BigQuery Standard SQL - without explicit joins

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
)
SELECT num
FROM (SELECT x, LEAD(x) OVER(ORDER BY x) AS next_x FROM YourTable),  
     UNNEST(GENERATE_ARRAY(x + 1,next_x - 1)) AS num
WHERE next_x - x > 1
ORDER BY x
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1

Your query can be written much more succinctly like so:

SELECT x
FROM (
    SELECT x,
           lag(x, 1) OVER ( ORDER BY x ) prev_x
    FROM ( VALUES (1), (2), (3), (4), (5), (6), (8), (10), (11) ) v(x)
) sub
WHERE x-prev_x > 1;

This will return the next highest value after a miss (8, 10), not the missing values themselves (7, 9). But of course you do not have the values handy.

If you know the range of values in the sequence, then you can use this:

SELECT s.x
FROM generate_series(<<min>>, <<max>>) s(x)
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

This returns the actual missing values.

If you do not know the range of values, you need to add a sub-query:

SELECT s.x
FROM ( SELECT min(x), max(x) FROM my_table ) r
JOIN generate_series(r.min, r.max) s(x) ON true
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

Alternatively, instead of a LEFT JOIN:

SELECT x
FROM ( SELECT min(x), max(x) FROM my_table ) r,
     generate_series(r.min, r.max) s(x)
WHERE NOT EXISTS (SELECT 1 FROM my_table t WHERE t.x = s.x);
Patrick
  • 29,357
  • 6
  • 62
  • 90
  • This is a solid answer. I don't see a reason for downvotes. Minor confusing detail: the OP's example omits 7 and 9 (not 8 and 9). – Erwin Brandstetter Sep 11 '16 at 14:51
  • @ErwinBrandstetter A down-vote is only a click away (but I did mess up the parentheses in the VALUES clause in my initial answer). Thanks for the thumbs-up. List of values corrected. – Patrick Sep 12 '16 at 07:03
1

The shortest solution in Postgres is with standard SQL EXCEPT:

WITH tbl(x) AS (SELECT unnest ('{1,2,3,4,5,6,8,10,11}'::int[]))
-- the CTE provides a temp table - might be an actual table instead
SELECT generate_series(min(x), max(x)) FROM tbl
EXCEPT ALL
TABLE  tbl;

The set-returning function unnest() is Postgres specific and just the shortest syntax to provide your set of numbers as table.

Also works with duplicates or NULL values in the data.

TABLE tbl is (standard SQL!) short syntax for SELECT * FROM tbl:

Related (with more explanation):

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I am fine with this solution for postgres. Is there any way to implement this without using the functions..? – Teja Sep 12 '16 at 18:21