11

Assume you have (in Postgres 9.1 ) a table like this:

date | value 

which have some gaps in it (I mean: not every possible date between min(date) and max(date) has it's row).

My problem is how to aggregate this data so that each consistent group (without gaps) is treated separately, like this:

min_date | max_date | [some aggregate of "value" column] 

Any ideas how to do it? I believe it is possible with window functions but after a while trying with lag() and lead() I'm a little stuck.

For instance if the data are like this:

 date          | value  
---------------+-------  
 2011-10-31    | 2  
 2011-11-01    | 8  
 2011-11-02    | 10  
 2012-09-13    | 1  
 2012-09-14    | 4  
 2012-09-15    | 5  
 2012-09-16    | 20  
 2012-10-30    | 10  

the output (for sum as the aggregate) would be:

   min     |    max     |  sum  
-----------+------------+-------  
2011-10-31 | 2011-11-02 |  20  
2012-09-13 | 2012-09-16 |  30  
2012-10-30 | 2012-10-30 |  10  
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
One Data Guy
  • 303
  • 1
  • 3
  • 8
  • Clodoaldo, thanks for your interest. for instance if the data are like this: date | value ---------------+------- 2011-10-31 | 2 2011-11-01 | 8 2011-11-02 | 10 2012-09-13 | 1 2012-09-14 | 4 2012-09-15 | 5 2012-09-16 | 20 2012-10-30 | 10 the output (for "sum" as the aggregate) would be: min | max | sum -----------+------------+------- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10 – One Data Guy Oct 22 '12 at 11:24
  • The word you are looking for is *consecutive*. See [this answer](http://stackoverflow.com/a/8015107/398670). – Craig Ringer Oct 22 '12 at 11:32
  • possible duplicate of [Group By and Aggregate Sequential Numeric Values](http://stackoverflow.com/questions/8014577/group-by-and-aggregate-sequential-numeric-values) – Craig Ringer Oct 22 '12 at 11:33
  • @CraigRinger, thanks a lot, that's just what I'm looking for. Although didn't find the solution yet, the word "consecutive" brings me much closer to what I'm looking for. – One Data Guy Oct 22 '12 at 11:33
  • @Craig Please notice that the key word for marking it as duplicate is `**exact** duplicate`. This is not the case. – Clodoaldo Neto Oct 22 '12 at 11:36
  • @Clodoaldo Re-reading the other question I'll pay that. They're very similar in essence, but not exactly the same. – Craig Ringer Oct 22 '12 at 11:37

2 Answers2

13
create table t ("date" date, "value" int);
insert into t ("date", "value") values
    ('2011-10-31', 2),
    ('2011-11-01', 8),
    ('2011-11-02', 10),
    ('2012-09-13', 1),
    ('2012-09-14', 4),
    ('2012-09-15', 5),
    ('2012-09-16', 20),
    ('2012-10-30', 10);

Simpler and cheaper version:

select min("date"), max("date"), sum(value)
from (
    select
        "date", value,
        "date" - (dense_rank() over(order by "date"))::int g
    from t
) s
group by s.g
order by 1

My first try was more complex and expensive:

create temporary sequence s;
select min("date"), max("date"), sum(value)
from (
    select 
        "date", value, d,
        case 
            when lag("date", 1, null) over(order by s.d) is null and "date" is not null 
                then nextval('s')
            when lag("date", 1, null) over(order by s.d) is not null and "date" is not null 
                then lastval()
            else 0 
        end g
    from 
        t
        right join
        generate_series(
            (select min("date") from t)::date, 
            (select max("date") from t)::date + 1, 
            '1 day'
        ) s(d) on s.d::date = t."date"
) q
where g != 0
group by g
order by 1
;
drop sequence s;

The output:

    min     |    max     | sum 
------------+------------+-----
 2011-10-31 | 2011-11-02 |  20
 2012-09-13 | 2012-09-16 |  30
 2012-10-30 | 2012-10-30 |  10
(3 rows)
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
0

Here is a way of solving it.

First, to get the beginning of consecutive series, this query would give you the first date:

SELECT first.date
FROM raw_data first
     LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL

likewise for the end of consecutive series,

SELECT last.date
FROM raw_data last
     LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL

You might consider making these views, to simplify queries using them.

We only need the first to form group ranges

CREATE VIEW beginings AS
SELECT first.date
FROM raw_data first
     LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL

CREATE VIEW endings AS
SELECT last.date
FROM raw_data last
     LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL

SELECT MIN(raw.date), MAX(raw.date), SUM(raw.value)
FROM raw_data raw
  INNER JOIN (SELECT lo.date AS lo_date, MIN(hi.date) as hi_date
              FROM beginnings lo, endings hi
              WHERE lo.date < hi.date
              GROUP BY lo.date) range
     ON raw.date >= range.lo_date AND raw.date <= range.hi_date
GROUP BY range.lo_date
Marlin Pierce
  • 9,931
  • 4
  • 30
  • 52