0

We have a table with ID autoincrement column with gaps. Each row has also a created date.

We are trying to find out for each gap ID (the missing ones) the prev/next date that is available in our table.

We already built a SQL that identifies the gaps (solution from here), so I have them in a table cached, but from here how to find the prev/next created date that covers the gap from the original table.

Input:

+----+------------+
| 84 | 1443728132 |
| 91 | 1443728489 |
| 93 | 1443729058 |
| 94 | 1443729200 |
+----+------------+

Output

+--------+------------+------------+
| gap_id |  prev_dt   |  next_dt   |
+--------+------------+------------+
|     85 | 1443728132 | 1443728489 |
|     86 | 1443728132 | 1443728489 |
|     87 | 1443728132 | 1443728489 |
|     88 | 1443728132 | 1443728489 |
|     89 | 1443728132 | 1443728489 |
|     90 | 1443728132 | 1443728489 |
|     92 | 1443728489 | 1443729058 |
+--------+------------+------------+

I ended up doing with Google BigQuery.

Community
  • 1
  • 1
Pentium10
  • 204,586
  • 122
  • 423
  • 502

3 Answers3

2

I would recommend grouping the results into ranges:

select min(id + 1) as first_missing_id,
       (next_id - 1) as last_missing_id,
       next_dte
from (select t.*,
             lead(id) over (order by id) as next_id,
             lead(dte) over (order by id) as next_dte
      from t
     ) t
where next_id <> id + 1
group by next_id, next_dte;

Getting individual ids is tricky. After all, if you have 1, 1000000, 1000000000 then you could be generating lots of rows.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Should work on most databases other than MySQL

select      *

from       (select      lag (id) over (order by id) + 1                                 as gap_start
                       ,id - 1                                                          as gap_end
                       ,lag (dt) over (order by id)                                     as dt_before_gap
                       ,dt                                                              as dt_after_gap
                       ,case when lag (id) over (order by id) + 1 <> id then 'Y' end    as is_gap

            from        t
            ) t

where       is_gap = 'Y'
;
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
1

For BigQuery Standard SQL

WITH yourTable AS (
SELECT 84 AS id, 1443728132 AS dt UNION ALL
SELECT 91 AS id, 1443728489 AS dt UNION ALL
SELECT 93 AS id, 1443729058 AS dt UNION ALL
SELECT 94 AS id, 1443729200 AS dt 
),
nums AS (
  SELECT num 
  FROM UNNEST(GENERATE_ARRAY((SELECT MIN(id) FROM YourTable), (SELECT MAX(id) FROM YourTable))) AS num
),
gaps AS (
  SELECT
    LAG (id) OVER (ORDER BY id) + 1 AS gap_start,
    id - 1 AS gap_end,
    LAG (dt) OVER (ORDER BY id) AS prev_dt,
    dt AS next_dt,
    CASE 
      WHEN LAG (id) OVER (ORDER BY id) + 1 <> id THEN 'Y'
    END AS is_gap
  FROM
    yourTable 
)
SELECT num as gap_id, prev_dt, next_dt
FROM gaps JOIN nums 
ON num BETWEEN gap_start AND gap_end
WHERE is_gap = 'Y'
ORDER BY num  

output:

gap_id     prev_dt     next_dt   
85      1443728132  1443728489   
86      1443728132  1443728489   
87      1443728132  1443728489   
88      1443728132  1443728489   
89      1443728132  1443728489   
90      1443728132  1443728489   
92      1443728489  1443729058   
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Unfortunately I had to use something else as `GENERATE_ARRAY` works up to 1 million entries only, and I had lot more entries to deal with. Also I ran out of resources because of the `order by`, so I had to remove it, and use a lot more static table to avoid any cross join. – Pentium10 Oct 24 '16 at 17:05
  • sure :o) please note - `ORDER BY` is not part of logic at all - just remove it - it is in example just for presentation purpose! – Mikhail Berlyant Oct 24 '16 at 17:06
  • would be great if you can share your final solution here – Mikhail Berlyant Oct 24 '16 at 17:16
  • I used the BQ cli too build a large static table running multiple queries of `GENERATE_ARRAY(i,j)` then removed order by, and split in pieces the query to avoid running out of resources. – Pentium10 Oct 24 '16 at 18:00
  • The limit of one million elements in `GENERATE_ARRAY` is rather arbitrary (we didn't want to make it too easy to end up with "resources exceeded"). How many elements did you end up needing? We could consider increasing the cap for the future. – Elliott Brossard Oct 24 '16 at 19:36