Given a sequence of numbers how to identify the missing numbers

Question

I would like to get all missing numbers in a sequence of numbers.
Just wondering if there is a better approach than below?

SELECT x
  FROM 
  (
    SELECT x,
           LAG(x,1) OVER ( ORDER BY x ) prev_x
      FROM 
       ( SELECT * FROM
        ( SELECT 1 AS x ),
        ( SELECT 2 AS x ),
        ( SELECT 3 AS x ),
        ( SELECT 4 AS x ),
        ( SELECT 5 AS x ),
        ( SELECT 6 AS x ),
        ( SELECT 8 AS x ),
        ( SELECT 10 AS x ),
        ( SELECT 11 AS x )
       )
  ) 
 WHERE x-prev_x > 1;

it `really` depends on which flavor of sql you are in with this! based on tags you added - i would assume this is for BigQuery? please confirm or clarify! — Mikhail Berlyant, Sep 11 '16 at 04:09
SO prefers specific questions that have a particular answer. Open questions (too many possible answers) are discouraged. — Erwin Brandstetter, Sep 11 '16 at 15:08

Mikhail Berlyant · Answer 1 · 2016-09-11T05:55:03.373

Let me be honest with you!
Any another working solution will be better then presented in question - for one simple reason - it is wrong! it does not return missing numbers at all! it rather shows number after next gap. that's all (hope you will appreciate that I opened your eyes on this)

Now, about better solution(s) - there are tons of options for you to pursue.
Note: below options for BigQuery ONLY!

Option 1

BigQuery Standard SQL - see How to Enable Standard SQL

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
),
nums AS (
  SELECT num 
  FROM UNNEST(GENERATE_ARRAY((SELECT MIN(x) FROM YourTable), (SELECT MAX(x) FROM YourTable))) AS num
)
SELECT num FROM nums
LEFT JOIN YourTable ON num = x
WHERE x IS NULL
ORDER BY num

Option 2

BigQuery Legacy SQL you can try below (here you need to set start/min and end/max values inside select expression for nums table

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 11, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 11
) AS nums
LEFT JOIN (
  SELECT x FROM
    (SELECT 1 AS x),
    (SELECT 2 AS x),
    (SELECT 3 AS x),
    (SELECT 6 AS x),
    (SELECT 8 AS x),
    (SELECT 10 AS x),
    (SELECT 11 AS x)
) AS YourTable
ON num = x
WHERE x IS NULL

Option 3

BigQuery Legacy SQL - if you want not to have dependency on min and max and have needs in setting those values - you can use below solution - it only requires to set high enough max to accommodate your expected growth (I put 1000 for example)

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 1000, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 1000
) AS nums
LEFT JOIN YourTable
ON num = x
WHERE x IS NULL
AND num BETWEEN (SELECT MIN(x) FROM YourTable) AND (SELECT MAX(x) FROM YourTable)

Option 4 (for some reason - my favorite so far)

BigQuery Standard SQL - without explicit joins

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
)
SELECT num
FROM (SELECT x, LEAD(x) OVER(ORDER BY x) AS next_x FROM YourTable),  
     UNNEST(GENERATE_ARRAY(x + 1,next_x - 1)) AS num
WHERE next_x - x > 1
ORDER BY x

Thanks for the solution. But what if the table(x) keeps increasing...? — Teja, Sep 11 '16 at 04:57
use first solution in my answer (standard sql) - it takes care of this! — Mikhail Berlyant, Sep 11 '16 at 04:58
In the first solution I am trying to implement using postgresql but those functions do not work... Tried using sqlfiddle link as below. http://sqlfiddle.com/#!15/19495/4 — Teja, Sep 11 '16 at 05:11
Option 2 works fine and its limited ... In option 1 cte's do not work on Bigquery.. Just ran it now... — Teja, Sep 11 '16 at 05:14
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/123085/discussion-between-soaddict-and-mikhail-berlyant). — Teja, Sep 11 '16 at 05:15
to run option1 you should enable standard sql - https://cloud.google.com/bigquery/sql-reference/enabling-standard-sql — Mikhail Berlyant, Sep 11 '16 at 05:16
and if option 2 works for yo - it means option 3 should also work for you! — Mikhail Berlyant, Sep 11 '16 at 05:16
Ravi, I hope you at least was able to finally run those in BigQuery! even though looks like you were more interested in postgresql version. otherwise i just wasted my time :o( — Mikhail Berlyant, Sep 12 '16 at 20:49

Patrick · Answer 2 · 2016-09-12T07:00:31.617

Your query can be written much more succinctly like so:

SELECT x
FROM (
    SELECT x,
           lag(x, 1) OVER ( ORDER BY x ) prev_x
    FROM ( VALUES (1), (2), (3), (4), (5), (6), (8), (10), (11) ) v(x)
) sub
WHERE x-prev_x > 1;

This will return the next highest value after a miss (8, 10), not the missing values themselves (7, 9). But of course you do not have the values handy.

If you know the range of values in the sequence, then you can use this:

SELECT s.x
FROM generate_series(<<min>>, <<max>>) s(x)
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

This returns the actual missing values.

If you do not know the range of values, you need to add a sub-query:

SELECT s.x
FROM ( SELECT min(x), max(x) FROM my_table ) r
JOIN generate_series(r.min, r.max) s(x) ON true
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

Alternatively, instead of a LEFT JOIN:

SELECT x
FROM ( SELECT min(x), max(x) FROM my_table ) r,
     generate_series(r.min, r.max) s(x)
WHERE NOT EXISTS (SELECT 1 FROM my_table t WHERE t.x = s.x);

This is a solid answer. I don't see a reason for downvotes. Minor confusing detail: the OP's example omits 7 and 9 (not 8 and 9). — Erwin Brandstetter, Sep 11 '16 at 14:51
@ErwinBrandstetter A down-vote is only a click away (but I did mess up the parentheses in the VALUES clause in my initial answer). Thanks for the thumbs-up. List of values corrected. — Patrick, Sep 12 '16 at 07:03

score 1 · Accepted Answer · edited May 23 '17 at 12:30

The shortest solution in Postgres is with standard SQL EXCEPT:

WITH tbl(x) AS (SELECT unnest ('{1,2,3,4,5,6,8,10,11}'::int[]))
-- the CTE provides a temp table - might be an actual table instead
SELECT generate_series(min(x), max(x)) FROM tbl
EXCEPT ALL
TABLE  tbl;

The set-returning function unnest() is Postgres specific and just the shortest syntax to provide your set of numbers as table.

Also works with duplicates or NULL values in the data.

TABLE tbl is (standard SQL!) short syntax for SELECT * FROM tbl:

Is there a shortcut for SELECT * FROM in psql?

Related (with more explanation):

I am fine with this solution for postgres. Is there any way to implement this without using the functions..? — Teja, Sep 12 '16 at 18:21

Given a sequence of numbers how to identify the missing numbers

3 Answers3