3

I'm attempting to query a table which contains a character varying[] column of years, and return those years as a string of comma-delimited year ranges. The year ranges would be determined by sequential years present within the array, and years/year ranges which are not sequential should be separated be commas.

The reason the data-type is character varying[] rather than integer[] is because a few of the values contain ALL instead of a list of years. We can omit these results.

So far I've had little luck approaching the problem as I'm not really even sure where to start.

Would someone be able to give me some guidance or provide a useful examples of how one might solve such as challenge?

years_table Example

+=========+============================+
| id      | years                      |
| integer | character varying[]        |
+=========+============================+
| 1       | {ALL}                      |
| 2       | {1999,2000,2010,2011,2012} |
| 3       | {1990,1991,2007}           |
+---------+----------------------------+

Output Goal:

Example SQL Query:

SELECT id, [year concat logic] AS year_ranges
FROM years_table WHERE 'ALL' NOT IN years

Result:

+====+======================+
| id | year_ranges          |
+====+======================+
| 2  | 1999-2000, 2010-2012 |
| 3  | 1990-1991, 2007      |
+----+----------------------+
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Joshua Burns
  • 8,268
  • 4
  • 48
  • 61
  • What determines the 1999-2000 vs the jump between 2010-2012? Assuming I didn't overlook something, I think you need to explain the ranges more clearly. – bma Jul 08 '13 at 18:09
  • Sorry, I had modified the introductory sentence and left our some much needed details. Edited. Essentially, I'm trying to create year ranges out of sequential year ranges. `{2000,2001,2002}` would return `2000-2002`. Years/year ranges which are not sequential would return delimited by a comma. Eg: `{2000,2002}` would return `2000, 2002`, and `{2000,2001,2002,2005,2006,2008}` would return `2000-2002, 2005-2006, 2008`. – Joshua Burns Jul 08 '13 at 18:28
  • I have to run for a bit, but one thought that immediately comes to mind is the use of LAG or LEAD http://www.postgresql.org/docs/current/static/functions-window.html – bma Jul 08 '13 at 18:34

2 Answers2

4
SELECT id, string_agg(year_range, ', ') AS year_ranges
FROM (
   SELECT id, CASE WHEN count(*) > 1
               THEN min(year)::text || '-' ||  max(year)::text 
               ELSE min(year)::text
              END AS year_range
   FROM  (
      SELECT *, row_number() OVER (ORDER BY id, year) - year AS grp
      FROM  (
         SELECT id, unnest(years) AS year
         FROM  (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
                      ,(3,      '{1990,1991,2007}')
               ) AS tbl(id, years)
         ) sub1
      ) sub2
   GROUP  BY id, grp
   ORDER  BY id, min(year)
   ) sub3
GROUP  BY id
ORDER  BY id

Produces exactly the desired result.

If you deal with an an array of varchar (varchar[], just cast it to int[], before you proceed. It seems to be in perfectly legal form for that:

years::int[]

Replace the inner sub-select with the name of your source table in productive code.

 FROM  (VALUES (2::int, '{1999,2000,2010,2011,2012}'::int[])
              ,(3,      '{1990,1991,2007}')
       ) AS tbl(id, years)

->

FROM  tbl

Since we are dealing with a naturally ascending number (the year) we can use a shortcut to form groups of consecutive years (forming a range). I subtract the year itself from row number (ordered by year). For consecutive years, both row number and year increment by one and produce the same grp number. Else, a new range starts.

More on window functions in the manual here and here.

A plpgsql function might be even faster in this case. You'd have to test. Examples in these related answers:
Ordered count of consecutive repeats / duplicates
ROW_NUMBER() shows unexpected values

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
2

SQL Fiddle Not the output format you asked for but I think it can be more useful:

select id, g, min(year), max(year)
from (
    select id, year,
        count(not g or null) over(partition by id order by year) as g
    from (
        select id, year,
            lag(year, 1, 0) over(partition by id order by year) = year - 1 as g
        from (
            select id, unnest(years)::integer as year
            from years
            where years != '{ALL}'
        ) s
    ) s
) s
group by 1, 2
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260