Return array of years as year ranges

Question

I'm attempting to query a table which contains a character varying[] column of years, and return those years as a string of comma-delimited year ranges. The year ranges would be determined by sequential years present within the array, and years/year ranges which are not sequential should be separated be commas.

The reason the data-type is character varying[] rather than integer[] is because a few of the values contain ALL instead of a list of years. We can omit these results.

So far I've had little luck approaching the problem as I'm not really even sure where to start.

Would someone be able to give me some guidance or provide a useful examples of how one might solve such as challenge?

years_table Example

+=========+============================+
| id      | years                      |
| integer | character varying[]        |
+=========+============================+
| 1       | {ALL}                      |
| 2       | {1999,2000,2010,2011,2012} |
| 3       | {1990,1991,2007}           |
+---------+----------------------------+

Output Goal:

Example SQL Query:

SELECT id, [year concat logic] AS year_ranges
FROM years_table WHERE 'ALL' NOT IN years

Result:

+====+======================+
| id | year_ranges          |
+====+======================+
| 2  | 1999-2000, 2010-2012 |
| 3  | 1990-1991, 2007      |
+----+----------------------+

What determines the 1999-2000 vs the jump between 2010-2012? Assuming I didn't overlook something, I think you need to explain the ranges more clearly. — bma, Jul 08 '13 at 18:09
Sorry, I had modified the introductory sentence and left our some much needed details. Edited. Essentially, I'm trying to create year ranges out of sequential year ranges. `{2000,2001,2002}` would return `2000-2002`. Years/year ranges which are not sequential would return delimited by a comma. Eg: `{2000,2002}` would return `2000, 2002`, and `{2000,2001,2002,2005,2006,2008}` would return `2000-2002, 2005-2006, 2008`. — Joshua Burns, Jul 08 '13 at 18:28
I have to run for a bit, but one thought that immediately comes to mind is the use of LAG or LEAD http://www.postgresql.org/docs/current/static/functions-window.html — bma, Jul 08 '13 at 18:34

score 4 · Accepted Answer · edited May 23 '17 at 11:56

4

SELECT id, string_agg(year_range, ', ') AS year_ranges
FROM (
   SELECT id, CASE WHEN count(*) > 1
               THEN min(year)::text || '-' ||  max(year)::text 
               ELSE min(year)::text
              END AS year_range
   FROM  (
      SELECT *, row_number() OVER (ORDER BY id, year) - year AS grp
      FROM  (
         SELECT id, unnest(years) AS year
         FROM  (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
                      ,(3,      '{1990,1991,2007}')
               ) AS tbl(id, years)
         ) sub1
      ) sub2
   GROUP  BY id, grp
   ORDER  BY id, min(year)
   ) sub3
GROUP  BY id
ORDER  BY id

Produces exactly the desired result.

If you deal with an an array of varchar (varchar[], just cast it to int[], before you proceed. It seems to be in perfectly legal form for that:

years::int[]

Replace the inner sub-select with the name of your source table in productive code.

 FROM  (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
              ,(3,      '{1990,1991,2007}')
       ) AS tbl(id, years)

->

FROM  tbl

Since we are dealing with a naturally ascending number (the year) we can use a shortcut to form groups of consecutive years (forming a range). I subtract the year itself from row number (ordered by year). For consecutive years, both row number and year increment by one and produce the same grp number. Else, a new range starts.

More on window functions in the manual here and here.

A plpgsql function might be even faster in this case. You'd have to test. Examples in these related answers:
Ordered count of consecutive repeats / duplicates
ROW_NUMBER() shows unexpected values

edited May 23 '17 at 11:56

Community

1
1

answered Jul 08 '13 at 18:42

Erwin Brandstetter

605,456
145
1,078
1,228

1

[It fails it I add this value](http://www.sqlfiddle.com/#!12/9eecb/8): `,(4, '{2010,2011}')` – Clodoaldo Neto Jul 08 '13 at 18:56
@ClodoaldoNeto: Thanks. [Fixed now.](http://www.sqlfiddle.com/#!12/9eecb/9) I forgot to add `id` to the `ORDER BY` clause of the window function when generalizing for n rows. – Erwin Brandstetter Jul 08 '13 at 19:04
1

@JoshuaBurns: Note the fix. – Erwin Brandstetter Jul 08 '13 at 19:06
1

aha, it looks like youre last update nailed it: http://www.sqlfiddle.com/#!12/a1227/14 – Joshua Burns Jul 08 '13 at 19:11
1

absolutely amazing sir. thank you so much for taking the time provide such a concise example. now it's time to learn more about `row_number() OVER`. :) – Joshua Burns Jul 08 '13 at 19:14
1

@JoshuaBurns: I added links to the manual to get you started on window functions. – Erwin Brandstetter Jul 08 '13 at 19:18
I see you added code at [SQL Fiddle](http://www.sqlfiddle.com/#!12/a1227/14) to deal with the `ALL` value but the code in the posted answer does not provide for that value. – Clodoaldo Neto Jul 08 '13 at 19:47

score 2 · Answer 2 · answered Jul 08 '13 at 18:36

SQL Fiddle Not the output format you asked for but I think it can be more useful:

select id, g, min(year), max(year)
from (
    select id, year,
        count(not g or null) over(partition by id order by year) as g
    from (
        select id, year,
            lag(year, 1, 0) over(partition by id order by year) = year - 1 as g
        from (
            select id, unnest(years)::integer as year
            from years
            where years != '{ALL}'
        ) s
    ) s
) s
group by 1, 2

Return array of years as year ranges

2 Answers2

Linked