Background
I have a table of values that some need attention:
| ID | AddedDate |
|---------|-------------|
| 1 | 2010-04-01 |
| 2 | 2010-04-01 |
| 3 | 2010-04-02 |
| 4 | 2010-04-02 |
| 5 | NULL | <----------- needs attention
| 6 | 2010-04-02 |
| 7 | 2010-04-03 |
| 8 | 2010-04-04 |
| 9 | 2010-04-04 |
| 2432659 | 2016-06-15 |
| 2432650 | 2016-06-16 |
| 2432651 | 2016-06-17 |
| 2432672 | 2016-06-18 |
| 2432673 | NULL | <----------- needs attention
| 2432674 | 2016-06-20 |
| 2432685 | 2016-06-21 |
I want to select the rows where AddedDate
is null, and i want to select rows around it. In this example question it would be sufficient to say rows where the ID
is ±3. This means i want:
| ID | AddedDate |
|---------|-------------|
| 2 | 2010-04-01 | ─╮
| 3 | 2010-04-02 | │
| 4 | 2010-04-02 | │
| 5 | NULL | ├──ID values ±3
| 6 | 2010-04-02 | │
| 7 | 2010-04-03 | │
| 8 | 2010-04-04 | ─╯
| 2432672 | 2016-06-18 | ─╮
| 2432673 | NULL | ├──ID values ±3
| 2432674 | 2016-06-20 | ─╯
Note: In reality it's a table of 9M rows, and 15k need attention.
Attempts
First i create a query that builds the ranges i'm interested in returning:
SELECT
ID-3 AS [Low ID],
ID+3 AS [High ID]
FROM Items
WHERE AddedDate IS NULL
Low ID High ID
------- -------
2 8
2432670 2432676
So my initial attempt to use this does work:
WITH dt AS (
SELECT ID-3 AS Low, ID+3 AS High
FROM Items
WHERE AddedDate IS NULL
)
SELECT * FROM Items
WHERE EXISTS(
SELECT 1 FROM dt
WHERE Items.ID BETWEEN dt.Low AND dt.High)
But when i try it on real data:
- 9 million total rows
- 15,000 interesting rows
- subtree cost of 63,318,400
- it takes hours (before i give up and cancel it)
There's probably a more efficient way.