1

Short and simple, I want to get a list of unique hours and their count representative.

SELECT DISTINCT(date_trunc('hour', occurred)) as time, COUNT(*)
  FROM log
  WHERE occurred BETWEEN '2014-01-01 00:00' AND '2014-01-01 23:59'
GROUP BY occurred;

This doesn't work, because I get way more results than 24.
So I went ahead and tried:

SELECT DISTINCT(occurred), COUNT(*)
    FROM log
    WHERE occurred BETWEEN .. AND ..
GROUP BY date_trunc('hour', occurred);

This is a invalid syntax and obviously me just winging it, hence I need help. How would one go about solving this?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Torxed
  • 22,866
  • 14
  • 82
  • 131
  • `distinct` is ***NOT*** a function. It always operates on **all** columns of the result. –  Nov 17 '14 at 09:41
  • I've always assumed it was a short-hand-version of doing `DISTINCT ON(occurred) occurred, ...`, I've guessed wrong then. – Torxed Nov 17 '14 at 09:43
  • @a_horse_with_no_name Go ahead and conduct a answer: `SELECT DISTINCT ON (time) date_trunc('hour', occurred) as time, COUNT(*) FROM log WHERE occurred BETWEEN '2014-01-01 00:00' AND '2014-01-01 23:59' GROUP BY occurred;` worked. – Torxed Nov 17 '14 at 09:47
  • Perhaps you need `date_part('hour', occurred)` instead here? – vyegorov Nov 17 '14 at 11:07
  • @vyegorov: `date_part()` is a non-standard (old) equivalent of the [standard SQL `EXTRACT`](http://www.postgresql.org/docs/current/interactive/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT) I suggested in my answer. – Erwin Brandstetter Nov 17 '14 at 13:52
  • @ErwinBrandstetter, it doesn't means that `date_part()` should be avoided, just 2 different functions for the same thing. And if you'll check `EXPLAIN` output, `extract` is converted into `date_part` calls internally. – vyegorov Nov 17 '14 at 18:31
  • @vyegorov: Did not want to imply it's deprecated. Just that `EXTRACT` is the standard SQL form. – Erwin Brandstetter Nov 17 '14 at 20:23

1 Answers1

1

You do not need DISTINCT here at all.

If you are just interested in a list of "hours" where anything happened (works for any time range):

SELECT date_trunc('hour', occurred) AS hour, COUNT(*) AS ct
FROM   log
WHERE  occurred >= '2014-01-01 0:0'
AND    occurred <  '2014-01-02 0:0'  -- do not use between for this
GROUP  BY 1
ORDER  BY 1;

BETWEEN includes upper and lower bound, which is typically not suitable for timestamp ranges:

What if nothing happens for more than an hour? If you still want a row in the result for hours without activity (ignoring the date in this variant!):

SELECT hour, COUNT(l.occurred) AS ct
FROM   generate_series (0,23) h(hour)
LEFT   JOIN log l ON occurred >= '2014-01-01 0:0'
                 AND occurred <  '2014-01-02 0:0'  -- do not use between for this
                 AND EXTRACT(hour FROM occurred)::int = h.hour
GROUP  BY hour
ORDER  BY hour;

generate_series() provides the full set of hours per day.
The LEFT JOIN preserves all of them.
Only count actual log entries.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228