13

So say we have 3 fields, username, start_date, end_date

Users start and stop multiple records, eg below bob has started and stopped two records.

bob   1/2/13 11:00  1/2/13 13:00
jack  1/2/13 15:00  1/2/13 18:00
bob   2/2/13 14:00  1/2/13 19:00

I need to know the average time taken (ie diff between start and end), in hours, for each user (ie group by user, not just for each row).

I can't quite get my head around how to do the diff, average AND group by? Any help?

vyegorov
  • 21,787
  • 7
  • 59
  • 73
Nick Foote
  • 2,425
  • 9
  • 36
  • 47

2 Answers2

21

You don't specify the granularity you want for the diff. This does it in days:

select username, avg(end_date - start_date) as avg_days
from mytable
group by username

If you want the difference in seconds, use datediff():

select username, avg(datediff(ss, start_date, end_date)) as avg_seconds
...

datediff can measure the diff in any time unit up to years by varying the first parameter, which can be ss, mi, hh, dd, wk, mm or yy.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Wow, ok, simpler than I imagined. Guess I was making it more complex in my head than it needed to be. – Nick Foote Feb 15 '13 at 13:43
  • Which version of PostgreSQL you're using here? Do you mind creating a SQLFiddle for this snippet? @NickFoote, I doubt this will work on a recent enough version of PostgreSQL. – vyegorov Feb 15 '13 at 14:46
  • psql doesn't have `datediff()` function. @vyegorov, why did you remove `postgresql` tag from the question? – HEX Dec 13 '19 at 14:34
  • huh. I'm getting arithmetic overflow exceptions. (using the 'ss' option of `datediff`) – IAmJersh Apr 09 '20 at 10:12
  • @TheHitchenator Dates more than 66.5 years apart have a difference in seconds greater than a signed 4 byte integer can hold. Big deal. – Bohemian Apr 09 '20 at 22:39
  • @Bohemian Surely not the cause... the system logging these dates has only been running for two years! but thanks for the tip - gives me some backing for what to look into! – IAmJersh Apr 11 '20 at 08:31
  • @TheHitchenator how long it’s been running does not necessarily govern what data is in the database - stuff can sneak in. Try `select min(myDateColumn), max(myDateColumn) from myTable` and see how it looks. – Bohemian Apr 11 '20 at 08:49
  • @Bohemian As expected, the earliest date is from 2018 and latest date is from 2020... no outliers. This is mostly why I'm perplexed by the overflow - I already kinda knew they wouldn't be there due to checks made for a BSI standard. – IAmJersh Apr 14 '20 at 16:04
  • @TheHitchenator any null dates? – Bohemian Apr 14 '20 at 18:28
  • @Bohemian yes, but I was filtering those out in the WHERE clause... I'll try again, possible I missed a column in that. Thanks for the help brianstorming! – IAmJersh Apr 15 '20 at 08:34
  • 1
    @TheHitchenator AFAIK non-index conditions in the where clause fire *after* result rows are created, so depending on the optimizer choice it’s possible a null date is involved with a calculation even though you are filtering them out of the final result. – Bohemian Apr 15 '20 at 19:10
5
SELECT [username], AVG(TIMESTAMPDIFF(HOUR, start_date, end_date))
FROM [table]
GROUP BY [username]
Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60
Marc Fischer
  • 1,296
  • 1
  • 13
  • 16