1

Test data:

sqlite3 tmp.sqlite "create table t (id integer, val integer)"
sqlite3 tmp.sqlite <<EOF
insert into t values
  (0, 0),
  (1, 5),
  (2, 10),
  (3, 14),
  (4, 15),
  (5, 16),
  (6, 20),
  (7, 25),
  (8, 29),
  (9, 30),
  (10, 30),
  (11, 31),
  (12, 35),
  (13, 40)
EOF

I will provide two hyper parameters:

  • diff: how far val can be between two rows so that they are considered neighbours.
  • min_neighbours: how many neighboring rows are needed to consider a group a cluster.

E.g. given:

  • diff: 1
  • min_neighbours: 2

I want output:

id val n_neighbours
3 14 2
4 15 3
5 16 2
8 29 3
9 30 4
10 30 4
11 31 3

This is because how many neighbours each row has:

id n_neighbours
0 0 1
1 5 1
2 10 1
3 14 2
4 15 3
5 16 2
6 20 1
7 25 1
8 29 3
9 30 4
10 30 4
11 31 3
12 35 1
13 40 1

For example:

  • row 0: neighbours have value between -1 and 1, so there are no neighbours except itself.
  • row 3: neighbours have value between 13 and 15. Therefore there is one neighbour except itself: row 4 which has val 15.
  • row 4: neighbours have value between 14 and 16. Therefore two neighbours except itself: row 3 and row 5.

Similar questions are either not about integer ranges or fail to precisely define what similar is supposed to mean:

user4157124
  • 2,809
  • 13
  • 27
  • 42
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985

1 Answers1

0

RANGE window function

This is a perfect application for the RANGE window function, which SQLite supports https://www.sqlite.org/windowfunctions.html

SELECT * FROM (
  SELECT id, val, COUNT(*) OVER (
    ORDER BY val RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) as c FROM t
) WHERE c > 1

produces the desired result:

3|14|2
4|15|3
5|16|2
8|29|3
9|30|4
10|30|4
11|31|3

To understand this let's run just the inner query:

SELECT id, val, COUNT(*) OVER (
  ORDER BY val RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
) as c FROM t

which produces the counts:

0|0|1
1|5|1
2|10|1
3|14|2
4|15|3
5|16|2
6|20|1
7|25|1
8|29|3
9|30|4
10|30|4
11|31|3
12|35|1
13|40|1

What RANGE does is exactly what we want: it looks for an unspecified number of rows before and after the current one, and checks if their value is in our desired range.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985