3

I am using Postgres 8.3 (no choice in the version at this time). My raw data table is as follows:

ID  start_time               finish_time
01   2013-01-23 10:47:52-05  2013-02-25 11:18:36-05

I can count between the two timestamps:

--relevant line in view creation query:
date_part('epoch',(finish_time - start_time)::interval)/3600 as hours

I do not want to include weekends. Also, I only want to count 09:00 - 17:30.

In a perfect world, I would subtract an hour for lunch per day as well and eventually I would also like to include company holidays but I would just like to solve this working hours portion first.

Any advice on how to approach this? I am pretty new to SQL. I am open to using SQLalchemy as well, but am also a beginner there as well and feel more comfortable with straight SQL.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
lostinthebits
  • 661
  • 2
  • 11
  • 24

2 Answers2

6

Imagine that you have a table of work minutes. (Or build one. This one isn't tested, so it might contain timezone and fencepost errors.)

create table work_minutes (
  work_minute timestamp primary key
);

insert into work_minutes
select work_minute
from 
  (select generate_series(timestamp '2013-01-01 00:00:00', timestamp '2013-12-31 11:59:00', '1 minute') as work_minute) t
where extract(isodow from work_minute) < 6
  and cast(work_minute as time) between time '09:00' and time '17:30'

Now your query can count minutes, and that's just dead simple.

select count(*)/60.0 as elapsed_hrs
from work_minutes
where work_minute between '2013-01-23 10:47:52' and '2013-02-25 11:18:36'

elapsed_hours
--
196.4

You can decide what to do with fractional hours.

There can be a substantial difference between calculating by minutes and calculating by hours, depending on how you treat the start time and such. Calculations based on hours might not count a lot of minutes in an hour that extends beyond the stop time. Whether it matters is application-dependent.

You can generate a virtual table like this on the fly with generate_series(), but a base table like this only needs about 4 million rows to cover 30 years, and this kind of calculation on it is really fast.

Later . . .

I see that Erwin Brandstetter covered the use of generate_series() for modern PostgreSQL; it won't work in version 8.3, because 8.3 doesn't support common table expressons or generate_series(timestamp, timestamp). Here's a version of Erwin's query that avoids those problems. This isn't a completely faithful translation; the calculations differ by an hour. That's probably a fencepost error on my part, but I don't have time to dig into the details right now.

select count(*) from 
(select timestamp '2013-01-23 10:47:52-05' + (n || ' hours')::interval
from generate_series(  0
                     , (extract(days from timestamp '2013-02-25 11:18:36-05' 
                                        - timestamp '2013-01-23 10:47:52-05')::integer * 24) ) n
where extract(isodow from (timestamp '2013-01-23 10:47:52-05' + (n || ' hours')::interval)) < 6
  and (timestamp '2013-01-23 10:47:52-05' + (n || ' hours')::interval)::time >= '09:00'::time
  and (timestamp '2013-01-23 10:47:52-05' + (n || ' hours')::interval)::time <  '17:30'::time
 ) t

A table-based solution has the advantage of easily handling management whimsy. "Hey! Our dog had seven puppies! Half day today!" It also scales well, and it works on virtually every platform without modification.

If you use generate_series(), wrap it in a view. That way, arbitrary changes to the rules can be managed in one place. And if the rules become too complicated to maintain easily within the view, you can replace the view with a table having the same name, and all the application code, SQL, and stored procedures and functions will just work.

Community
  • 1
  • 1
Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • I removed my answer since it won't work with Postgres 8.3. You got that right. I moved the code to [my related answer for modern Postgres](http://stackoverflow.com/questions/1839319/calculate-working-hours-between-2-dates-in-postgresql/17282639#17282639) instead. – Erwin Brandstetter Jun 25 '13 at 15:56
  • @ErwinBrandstetter: And I linked to it. You might be able to use generate_series() with integer arguments, but it will be more complicated. If I have time to work on that, I'll add it here. (And in your answer, if it seems appropriate.) – Mike Sherrill 'Cat Recall' Jun 25 '13 at 16:10
  • You may be interested: I advanced your work some more in another answer. – Erwin Brandstetter Jun 26 '13 at 02:45
3

This advances the work in progress provided by @Catcall:

SELECT count(*)
FROM   generate_series(0, extract(days from timestamp '2013-02-25 11:18:36' 
                                          - timestamp '2013-01-23 10:47:52')::int * 24) n
WHERE  extract(ISODOW from timestamp '2013-01-23 10:47:52' + n * interval '1h') < 6
AND   (timestamp '2013-01-23 10:47:52' + n * interval '1h')::time >= '09:00'::time
AND   (timestamp '2013-01-23 10:47:52' + n * interval '1h')::time <  '17:30'::time
  • timestamp '2013-01-23 10:47:52-05' is not doing what you seem to think. The time zone offset -05 is discarded because you cast the literal to timestamp [without timezone]. You probably wanted timestamptz '2013-01-23 10:47:52-05'. However, working hours are usually bound to local time, so it could be argued that timestamp [without time zone] is a better fit to begin with. More in this related answer:
    Ignoring timezones altogether in Rails and PostgreSQL

  • This form is much more efficient

    timestamptz '2013-01-23 10:47:52-05' + n * interval '1h'
    

    than this:

    timestamptz '2013-01-23 10:47:52-05' + (n || ' hours')::interval
    

    You can simply multiply any interval.

Function

I developed further and wrapped it into an SQL function.
Still not precise, but it fixes a systematic error and has a smaller rounding error due to half-hour units.

CREATE OR REPLACE FUNCTION f_worktime83(t_start timestamp
                                      , t_end timestamp)
  RETURNS interval AS
$func$

SELECT (count(*) - 1) * interval '30 min' -- fix off-by-one error
FROM   (
   SELECT $1 + generate_series(0, (extract(epoch FROM $2 - $1)/1800)::int)
             * interval '30 min' AS t
   ) sub
WHERE  extract(ISODOW from t) < 6
AND    t::time >= '09:00'::time
AND    t::time <  '17:30'::time

$func$  LANGUAGE sql

Call:

SELECT f_worktime83('2013-06-26 10:47:52', '2013-06-26 11:10:51')
  • Add values to generate_series() directly, simplifying the code.
  • Get a (rounded) precise number of time units by extracting the epoc (number of seconds) and dividing it by 1800 (seconds in 30 minutes).
  • fix off-by-1 error that comes including the upper border in the count.
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • When I try to create that function, I get the following error: ERROR: function generate_series(integer, double precision) does not exist LINE 8: SELECT $1 + generate_series(0, extract(epoch FROM $2 - $1... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. – lostinthebits Jul 08 '13 at 15:35
  • @lostinthebits: The cast to integer was outside the closing bracket by mistake. I fixed that, it should work now. – Erwin Brandstetter Jul 08 '13 at 15:50