How do I group a collection of elements by whether or not they have consecutive values?

Question

So given a table like the one below, I would like to grab rows where id has at least three consecutive years.

+---------+--------+
|    id   |  year  |
+------------------+
|    2    |   2003 |
|    2    |   2004 |
|    1    |   2005 |
|    2    |   2005 |
|    1    |   2007 |
|    1    |   2008 |
+---------+--------+

The result over here would be of course:

+---------+         
|   id    |         
+---------+         
|    2    |         
+---------+

Any input at all as to how I could go about structuring a query to do this would be great.

Do you want a SQL query statement to do this? – littlet Mar 08 '15 at 12:41 — littlet, Mar 08 '15 at 12:41

potashin · Answer 1 · 2015-03-08T13:19:38.970

1

You can use JOIN approach (self-join):

SELECT t1.id
FROM tbl t1 
JOIN tbl t2 ON t2.year = t1.year + 1
           AND t1.id = t2.id
JOIN tbl t3 ON t3.year = t1.year + 2
           AND t1.id = t3.id

SQLFiddle

edited Mar 08 '15 at 13:19

answered Mar 08 '15 at 12:45

potashin

44,205
11
83
107

Frank Heikens · Accepted Answer · 2015-03-08T13:46:14.030

This one works and can be fast when you have at least an index on the id-field:

WITH t1 AS (
    SELECT  *
    FROM    (VALUES
            (2,2003),
            (2,2004),
            (1,2005),
            (2,2005),
            (1,2007),
            (1,2008)
            ) v(id, year) 
)
SELECT  DISTINCT t1.id
FROM    t1 -- your tablename
    JOIN t1 AS t2 ON t1.id = t2.id AND t1.year + 1 = t2.year
    JOIN t1 AS t3 ON t1.id = t3.id AND t1.year + 2 = t3.year;

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

Combination `(id, year)` is `UNIQUE`

Typically guaranteed with a PRIMARY KEY or UNIQUE constraint or a unique index.

This is a general solution for any minimum number of consecutive rows:

SELECT DISTINCT id
FROM  (
   SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
   FROM   tbl
   ) sub
GROUP  BY id, grp
HAVING count(*) > 2;  -- minimum: 3

This should be faster than self-joining repeatedly, because only a single scan on the base table is needed. Test performance with EXPLAIN ANALYZE.

Related answer with detailed explanation:

Select longest continuous sequence

Combination `(id, year)` is not `UNIQUE`

You can make it unique in a first step.

SELECT DISTINCT id
FROM  (
   SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
   FROM   tbl
   GROUP  BY id, year
   ) sub
GROUP  BY id, grp
HAVING count(*) > 2;  -- minimum: 3

SQL Fiddle.

Or you could use the window function dense_rank() instead of row_number() and then count(DISTINCT year), but I don't see a benefit in this approach.

Understanding the sequence of events in a SELECT query is the key:

Best way to get result count before LIMIT was applied

How do I group a collection of elements by whether or not they have consecutive values?

3 Answers3

Combination (id, year) is UNIQUE

Combination (id, year) is not UNIQUE

Combination `(id, year)` is `UNIQUE`

Combination `(id, year)` is not `UNIQUE`