How to select rows, and nearby rows

Question

Background

I have a table of values that some need attention:

| ID      | AddedDate   |
|---------|-------------|
|       1 | 2010-04-01  |
|       2 | 2010-04-01  |
|       3 | 2010-04-02  |
|       4 | 2010-04-02  |
|       5 | NULL        | <----------- needs attention
|       6 | 2010-04-02  |
|       7 | 2010-04-03  |
|       8 | 2010-04-04  |
|       9 | 2010-04-04  |
| 2432659 | 2016-06-15  |
| 2432650 | 2016-06-16  |
| 2432651 | 2016-06-17  |
| 2432672 | 2016-06-18  |
| 2432673 | NULL        | <----------- needs attention
| 2432674 | 2016-06-20  |
| 2432685 | 2016-06-21  |

I want to select the rows where AddedDate is null, and i want to select rows around it. In this example question it would be sufficient to say rows where the ID is ±3. This means i want:

| ID      | AddedDate   |
|---------|-------------|
|       2 | 2010-04-01  | ─╮
|       3 | 2010-04-02  |  │
|       4 | 2010-04-02  |  │
|       5 | NULL        |  ├──ID values ±3
|       6 | 2010-04-02  |  │
|       7 | 2010-04-03  |  │
|       8 | 2010-04-04  | ─╯

| 2432672 | 2016-06-18  | ─╮
| 2432673 | NULL        |  ├──ID values ±3
| 2432674 | 2016-06-20  | ─╯

Note: In reality it's a table of 9M rows, and 15k need attention.

Attempts

First i create a query that builds the ranges i'm interested in returning:

SELECT
  ID-3 AS [Low ID],
  ID+3 AS [High ID]
FROM Items
WHERE AddedDate IS NULL

Low ID   High ID
-------  -------
2        8 
2432670  2432676

So my initial attempt to use this does work:

WITH dt AS (
   SELECT ID-3 AS Low, ID+3 AS High
   FROM Items
   WHERE AddedDate IS NULL
)
SELECT * FROM Items
WHERE EXISTS(
    SELECT 1 FROM dt
    WHERE Items.ID BETWEEN dt.Low AND dt.High)

But when i try it on real data:

9 million total rows
15,000 interesting rows
subtree cost of 63,318,400
it takes hours (before i give up and cancel it)

There's probably a more efficient way.

Bonus Reading

score 4 · Answer 1 · answered Feb 24 '18 at 16:59

Here is one method that uses the windowing clause:

select i.*
from (select i.*,
             count(*) over (order by id rows between 3 preceding and 1 preceding) as cnt_prec,
             count(*) over (order by id rows between 1 following and 3 following) as cnt_foll,
             count(addeddate) over (order by id rows between 3 preceding and 1 preceding) as cnt_ad_prec,
             count(addeddate) over (order by id rows between 1 following and 3 following) as cnt_ad_foll
      from items
     ) i
where cnt_ad_prec <> cnt_prec or
      cnt_ad_foll <> cnt_foll or
      addeddate is null;
order by id;

This returns all rows that have NULL in the column or are within three rows of a NULL.

The need for the comparison to the count is to avoid the edge issues on the smallest and largest ids.

i think this will return 3 previous and 3 next rows of the null row..even if they are not id+/-3 — Vamsi Prabhala, Feb 24 '18 at 17:10
Btw, this could be simplified by merging two counts into one using `rows between 3 preceding and 3 following` — dnoeth, Feb 25 '18 at 10:05

score 4 · Accepted Answer · answered Feb 24 '18 at 17:29

4

This is your existing logic rewritten using an moving max:

WITH dt AS (
   SELECT
      ID, AddedDate,
      -- check if there's a NULL within a range of +/- 3 rows
      -- and remember it's ID 
      max(case when AddedDate is null then id end)
      over (order by id 
            rows between 3 preceding and 3 following) as NullID
   FROM Items 
)
SELECT *
FROM dt
where id between NullID-3 and NullID+3

answered Feb 24 '18 at 17:29

dnoeth

59,503
4
39
56

This seems like the most concise method. – Gordon Linoff Feb 24 '18 at 17:49
This method seems to miss returning some *"interesting"* rows; i think when there are a bunch of null rows together. – Ian Boyd Feb 24 '18 at 19:10
Can you add an example to your fiddle where it fails? – dnoeth Feb 24 '18 at 22:19
@GordonLinoff: This will not change the result, because all NULLs are reported anyway, they always fall in that +/-3 range :-) – dnoeth Feb 26 '18 at 07:39

i486 · Answer 3 · 2018-02-24T17:15:44.177

3

Another way:

SELECT i1.*
    FROM Items i1, Items i2
        WHERE i2.AddedDate IS NULL AND ABS(i1.ID - i2.ID) <= 3

I hope there is index on AddedDate column.

edited Feb 24 '18 at 17:15

answered Feb 24 '18 at 17:06

i486

6,491
4
24
41

With an estimated subtree cost of 15M, it let it ran for 45 minutes before cancelling. But it's a much more understandable method than the others. – Ian Boyd Feb 24 '18 at 18:08
1

This would be a good use of Filtered Indexes in SQL server 2008 or newer. – Sql Surfer Feb 24 '18 at 18:25
1

@IanBoyd The SQL Fiddle shows different plans for this form and `WHERE i2.AddedDate IS NULL AND i1.ID BETWEEN i2.ID - 3 AND i2.ID + 3`. Is it also different for you? – GSerg Feb 24 '18 at 19:51
@SqlSurfer I don't think it would be. If you mean a filtered index on `AddedDate`, then a regular index will give you `null` values just as fast, and you cannot have a filtered index for the `ID+-3` situation. – GSerg Feb 24 '18 at 19:53

Jorge Y. · Answer 4 · 2018-02-24T18:49:32.590

Just to try a different approach than other answers... How about using a table variable to store the ids you want. Then you join. My hope is that the insert executes fast enough, and then the SELECT can take advantage of the clustered index in Items. Unfortunately I don't have here your amount of data to test its efficiency:

DECLARE @userData TABLE(
    idInRange int NOT NULL
)

INSERT INTO @userData (idInRange)
SELECT DISTINCT i.Id + r 
FROM Items i 
CROSS JOIN (
  SELECT -3 as r UNION ALL SELECT -2 as r UNION ALL SELECT -1 as r UNION ALL SELECT 0 as r UNION ALL
  SELECT 1 as r UNION ALL SELECT 2 as r UNION ALL SELECT 3 as r 
) yourRange
WHERE AddedDate IS NULL;

SELECT i.*
FROM @userData u
INNER JOIN Items i ON i.ID = u.idInRange

Edited to add a DISTINCT when filling the table variable, to avoid duplicated rows just in case there are two contiguous NULL dates and their id ranges overlap

I suspect that is exactly what the server is doing in that `Table spool (Lazy spool)`. — GSerg, Feb 24 '18 at 19:59

How to select rows, and nearby rows

Background

Attempts

Bonus Reading

4 Answers4