1

I have a table in a Postgres DB like this:

person    |     eventdate     |  type 
--------------------------------------
<uuid-1>  |   2016-05-14      |   300
<uuid-3>  |   2016-05-14      |   300
<uuid-1>  |   2016-05-15      |   301
<uuid-1>  |   2016-05-16      |   301
<uuid-1>  |   2016-05-18      |   304
<uuid-1>  |   2016-05-22      |   300
<uuid-2>  |   2016-05-22      |   304
<uuid-2>  |   2016-05-27      |   301
<uuid-1>  |   2016-05-30      |   300
<uuid-1>  |   2016-06-01      |   300
<uuid-2>  |   2016-06-15      |   501
<uuid-2>  |   2016-06-16      |   301
<uuid-4>  |   2016-06-16      |   300
<uuid-5>  |   2016-06-20      |   300
<uuid-1>  |   2016-06-21      |   300
<uuid-2>  |   2016-06-21      |   300
<uuid-2>  |   2016-06-23      |   301
<uuid-2>  |   2016-06-30      |   300
<uuid-3>  |   2016-06-30      |   300
<uuid-4>  |   2016-06-30      |   300

The table contains non consecutive day entries for each day an employee is absence for different reasons (types) of absence. However an absence period could span several of these days and any absence entry that is within 5 days of a previous absence of the same type is still considered part of the same absence 'period'.

I need to get output for each employees absences periods with start and end date of those periods, plus the total number of days within that multi-date-spanned period.

This is further complicated by the fact that different types of absence are considered the same for the purpose of this report. So in the example above, type 300, 301, 304 would be treated as the same.

So from my example above the following would be what I am after ...

person    |     startdate     |       enddate     |  days   |  type 
--------------------------------------------------------------------
<uuid-1>  |     2016-05-14    |     2016-05-22    |   5     |   300
<uuid-3>  |     2016-05-14    |     2016-04-14    |   1     |   300
<uuid-2>  |     2016-05-22    |     2016-04-27    |   2     |   304
<uuid-1>  |     2016-05-30    |     2016-06-01    |   2     |   300
<uuid-2>  |     2016-06-15    |     2016-06-15    |   1     |   501
<uuid-2>  |     2016-06-16    |     2016-06-16    |   1     |   301
<uuid-4>  |     2016-06-16    |     2016-06-16    |   1     |   300
<uuid-5>  |     2016-06-20    |     2016-06-20    |   1     |   300
<uuid-1>  |     2016-06-21    |     2016-06-21    |   1     |   300
<uuid-2>  |     2016-06-21    |     2016-06-23    |   2     |   300
<uuid-2>  |     2016-06-30    |     2016-06-30    |   1     |   300
<uuid-3>  |     2016-06-30    |     2016-06-30    |   1     |   300
<uuid-4>  |     2016-06-30    |     2016-06-30    |   1     |   300

How do I query this table to this output?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Ben Drury
  • 1,356
  • 2
  • 16
  • 34
  • The result doesn't add up. There is no type `503` in the example. And it's unclear how you determine the `type` for each period. Postgres version and table definition showing all data types and constraints should be there, too. – Erwin Brandstetter Aug 17 '16 at 00:33
  • @ErwinBrandstetter I am using the first value for the type, but the lowest will also work. The type output is not so important as making sure that the right types are blocked together. – Ben Drury Aug 17 '16 at 07:45
  • My answer is blocking *all* types together. Not sure, maybe you wanted to keep 501 separate? – Erwin Brandstetter Aug 17 '16 at 13:34

1 Answers1

2

It's unclear how you determine the type for each period. I chose the minimum number.

Assuming this basic table definition:

CREATE TABLE tbl (person text, eventdate date, type int);

Basically, I suggest window functions in two nested subqueries to identify members of the same period (island). Then aggregate:

SELECT person, period
     , min(eventdate) AS startdate
     , max(eventdate) AS enddate
     , count(*)       AS days
     , min(type)      AS type
FROM  (
   SELECT person, eventdate, type
        , count(gap) OVER (PARTITION BY person ORDER BY eventdate) AS period
   FROM  (
      SELECT person, eventdate, type
           , CASE WHEN lag(eventdate) OVER (PARTITION BY person ORDER BY eventdate)
                     > eventdate - 6  -- within 5 days
                  THEN NULL           -- same period
                  ELSE TRUE           -- next period
             END AS gap
      FROM   tbl
      ) sub
   ) sub
GROUP  BY person, period
ORDER  BY person, period;

Result (based on your example data):

  person  | period | startdate  |  enddate   | days | type
----------+--------+------------+------------+------+------
 <uuid-1> |      1 | 2016-05-14 | 2016-05-22 |    5 |  300
 <uuid-1> |      2 | 2016-05-30 | 2016-06-01 |    2 |  300
 <uuid-1> |      3 | 2016-06-21 | 2016-06-21 |    1 |  300
 <uuid-2> |      1 | 2016-05-22 | 2016-05-27 |    2 |  301
 <uuid-2> |      2 | 2016-06-15 | 2016-06-23 |    4 |  300
 <uuid-2> |      3 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-3> |      1 | 2016-05-14 | 2016-05-14 |    1 |  300
 <uuid-3> |      2 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-4> |      1 | 2016-06-16 | 2016-06-16 |    1 |  300
 <uuid-4> |      2 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-5> |      1 | 2016-06-20 | 2016-06-20 |    1 |  300

If the same day for the same person can be entered multiple times with different types, and you only want to count distinct days, make it: count(DISTINCT eventdate) AS days.

Related, with detailed explanation:

BTW, eventdate - 6 works for data type date, but not for timestamp:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228