1

So given a table like the one below, I would like to grab rows where id has at least three consecutive years.

+---------+--------+
|    id   |  year  |
+------------------+
|    2    |   2003 |
|    2    |   2004 |
|    1    |   2005 |
|    2    |   2005 |
|    1    |   2007 |
|    1    |   2008 |
+---------+--------+

The result over here would be of course:

+---------+         
|   id    |         
+---------+         
|    2    |         
+---------+      

Any input at all as to how I could go about structuring a query to do this would be great.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Louis93
  • 3,843
  • 8
  • 48
  • 94

3 Answers3

1

You can use JOIN approach (self-join):

SELECT t1.id
FROM tbl t1 
JOIN tbl t2 ON t2.year = t1.year + 1
           AND t1.id = t2.id
JOIN tbl t3 ON t3.year = t1.year + 2
           AND t1.id = t3.id

SQLFiddle

potashin
  • 44,205
  • 11
  • 83
  • 107
1

This one works and can be fast when you have at least an index on the id-field:

WITH t1 AS (
    SELECT  *
    FROM    (VALUES
            (2,2003),
            (2,2004),
            (1,2005),
            (2,2005),
            (1,2007),
            (1,2008)
            ) v(id, year) 
)
SELECT  DISTINCT t1.id
FROM    t1 -- your tablename
    JOIN t1 AS t2 ON t1.id = t2.id AND t1.year + 1 = t2.year
    JOIN t1 AS t3 ON t1.id = t3.id AND t1.year + 2 = t3.year;
Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
1

Combination (id, year) is UNIQUE

Typically guaranteed with a PRIMARY KEY or UNIQUE constraint or a unique index.

This is a general solution for any minimum number of consecutive rows:

SELECT DISTINCT id
FROM  (
   SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
   FROM   tbl
   ) sub
GROUP  BY id, grp
HAVING count(*) > 2;  -- minimum: 3

This should be faster than self-joining repeatedly, because only a single scan on the base table is needed. Test performance with EXPLAIN ANALYZE.

Related answer with detailed explanation:

Combination (id, year) is not UNIQUE

You can make it unique in a first step.

SELECT DISTINCT id
FROM  (
   SELECT id, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
   FROM   tbl
   GROUP  BY id, year
   ) sub
GROUP  BY id, grp
HAVING count(*) > 2;  -- minimum: 3

SQL Fiddle.

Or you could use the window function dense_rank() instead of row_number() and then count(DISTINCT year), but I don't see a benefit in this approach.

Understanding the sequence of events in a SELECT query is the key:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228