Find the gap in ID column + select the prev/next date column

Question

We have a table with ID autoincrement column with gaps. Each row has also a created date.

We are trying to find out for each gap ID (the missing ones) the prev/next date that is available in our table.

We already built a SQL that identifies the gaps (solution from here), so I have them in a table cached, but from here how to find the prev/next created date that covers the gap from the original table.

Input:

+----+------------+
| 84 | 1443728132 |
| 91 | 1443728489 |
| 93 | 1443729058 |
| 94 | 1443729200 |
+----+------------+

Output

+--------+------------+------------+
| gap_id |  prev_dt   |  next_dt   |
+--------+------------+------------+
|     85 | 1443728132 | 1443728489 |
|     86 | 1443728132 | 1443728489 |
|     87 | 1443728132 | 1443728489 |
|     88 | 1443728132 | 1443728489 |
|     89 | 1443728132 | 1443728489 |
|     90 | 1443728132 | 1443728489 |
|     92 | 1443728489 | 1443729058 |
+--------+------------+------------+

I ended up doing with Google BigQuery.

doesn't matter, we can migrate to a preferred solution as we have systems in lots of them MySQL, Microsoft, BigQuery. — Pentium10, Oct 24 '16 at 10:02
A solution for the limited capabilities of MySQL will look **very** different to a solution for a DBMS that supports [modern SQL](http://modern-sql.com/slides) — , Oct 24 '16 at 10:04
we can adapt, we are not looking to find a solution in all systems, just one in which ever system is easier — Pentium10, Oct 24 '16 at 10:05
what if theres several continuous missing id number ? might you put more data for testing ? — Nemeros, Oct 24 '16 at 10:09
@Pentium10 . . . Add a sample of data *in the question* not as files to be downloaded. Also include desired results. — Gordon Linoff, Oct 24 '16 at 11:22

score 2 · Answer 1 · answered Oct 24 '16 at 11:46

2

I would recommend grouping the results into ranges:

select min(id + 1) as first_missing_id,
       (next_id - 1) as last_missing_id,
       next_dte
from (select t.*,
             lead(id) over (order by id) as next_id,
             lead(dte) over (order by id) as next_dte
      from t
     ) t
where next_id <> id + 1
group by next_id, next_dte;

Getting individual ids is tricky. After all, if you have 1, 1000000, 1000000000 then you could be generating lots of rows.

answered Oct 24 '16 at 11:46

Gordon Linoff

1,242,037
58
646
786

THe results are already built from lower edge of each range. No need to aggregate (next_id is unique) – David דודו Markovitz Oct 24 '16 at 12:47

score 1 · Accepted Answer · answered Oct 24 '16 at 10:41

Should work on most databases other than MySQL

select      *

from       (select      lag (id) over (order by id) + 1                                 as gap_start
                       ,id - 1                                                          as gap_end
                       ,lag (dt) over (order by id)                                     as dt_before_gap
                       ,dt                                                              as dt_after_gap
                       ,case when lag (id) over (order by id) + 1 <> id then 'Y' end    as is_gap

            from        t
            ) t

where       is_gap = 'Y'
;

score 1 · Answer 3 · answered Oct 24 '16 at 16:14

1

For BigQuery Standard SQL

WITH yourTable AS (
SELECT 84 AS id, 1443728132 AS dt UNION ALL
SELECT 91 AS id, 1443728489 AS dt UNION ALL
SELECT 93 AS id, 1443729058 AS dt UNION ALL
SELECT 94 AS id, 1443729200 AS dt 
),
nums AS (
  SELECT num 
  FROM UNNEST(GENERATE_ARRAY((SELECT MIN(id) FROM YourTable), (SELECT MAX(id) FROM YourTable))) AS num
),
gaps AS (
  SELECT
    LAG (id) OVER (ORDER BY id) + 1 AS gap_start,
    id - 1 AS gap_end,
    LAG (dt) OVER (ORDER BY id) AS prev_dt,
    dt AS next_dt,
    CASE 
      WHEN LAG (id) OVER (ORDER BY id) + 1 <> id THEN 'Y'
    END AS is_gap
  FROM
    yourTable 
)
SELECT num as gap_id, prev_dt, next_dt
FROM gaps JOIN nums 
ON num BETWEEN gap_start AND gap_end
WHERE is_gap = 'Y'
ORDER BY num

output:

gap_id     prev_dt     next_dt   
85      1443728132  1443728489   
86      1443728132  1443728489   
87      1443728132  1443728489   
88      1443728132  1443728489   
89      1443728132  1443728489   
90      1443728132  1443728489   
92      1443728489  1443729058

answered Oct 24 '16 at 16:14

Mikhail Berlyant

165,386
8
154
230

Unfortunately I had to use something else as `GENERATE_ARRAY` works up to 1 million entries only, and I had lot more entries to deal with. Also I ran out of resources because of the `order by`, so I had to remove it, and use a lot more static table to avoid any cross join. – Pentium10 Oct 24 '16 at 17:05
sure :o) please note - `ORDER BY` is not part of logic at all - just remove it - it is in example just for presentation purpose! – Mikhail Berlyant Oct 24 '16 at 17:06
would be great if you can share your final solution here – Mikhail Berlyant Oct 24 '16 at 17:16
I used the BQ cli too build a large static table running multiple queries of `GENERATE_ARRAY(i,j)` then removed order by, and split in pieces the query to avoid running out of resources. – Pentium10 Oct 24 '16 at 18:00
The limit of one million elements in `GENERATE_ARRAY` is rather arbitrary (we didn't want to make it too easy to end up with "resources exceeded"). How many elements did you end up needing? We could consider increasing the cap for the future. – Elliott Brossard Oct 24 '16 at 19:36

Find the gap in ID column + select the prev/next date column

3 Answers3