2

I'm a fairly new Postgres user, I'm sure there's an answer to this already but I can't find it.
I need to analyze some data in an activity log table, grouping the the results by time period.

A simple version of the problem would be a table with three fields:

    Column    |           Type           |              Modifiers
--------------+--------------------------+-------------------------------------
 period_start | timestamp with time zone | not null
 user_id      | text                     | not null
 action       | text                     | not null

The action string I want to capture could be 'create_entry' (yes, I know that sucks as good db design but I'm stuck with it)

The output I'm looking for is a report showing count of 'create_entry' actions by year and month. Something like:

 Year | Month | Entries
------+-------+----------
 2013 |  12   | 14345
 2014 |   1   | 9876
 2014 |   2   | 10234

My instinct is to attack this with sub-queries but looking around there seems to be a suggestion that loops might be the way to go.

Either way, I'm out of my depth and looking for a push in the right direction.

EDIT

A supplementary question based on a request that came in while I was typing this.. A variation on the above that would show a cumulative total month by month (although I'd like to think I could figure that out by myself based on the approach to the above - I'm just adding this in case it's relevant to the approach).

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
PerryW
  • 1,426
  • 1
  • 15
  • 25

2 Answers2

3

Use date_trunc() to truncate timestamps to the month and cover both year and month in a single column. You can use to_char() to format that any way you like.

To get the running count you mention in your supplementary question, add a window function:

SELECT to_char(date_trunc('month', period_start), 'Mon YYYY') AS month
     , count(*) AS month_ct
     , sum(count(*)) OVER (ORDER BY date_trunc('month', period_start)) AS running_ct
FROM   activity_log
WHERE  action = 'create_entry'
GROUP  BY date_trunc('month', period_start);

Window functions are executed after aggregate functions, so we can run a window function over an aggregate in the same query level. See:

It's essential to use the same basic expression in the window function and in GROUP BY: date_trunc('month', period_start).

Since you operate on timestamptz, fix possible corner case issues with your time zone by truncating to the time boundary in your time zone. Using 'Europe/Vienna' in my example:

SELECT to_char(date_trunc('month', period_start AT TIME ZONE 'Europe/Vienna'), 'Mon YYYY') AS month
     , COUNT(*) AS month_ct
     , sum(count(*)) OVER (ORDER BY date_trunc('month', period_start AT TIME ZONE 'Europe/Vienna')) AS running_ct
FROM   activity_log
WHERE  action = 'create_entry'
GROUP  BY date_trunc('month', period_start AT TIME ZONE 'Europe/Vienna');

Since Postgres 12 we could also feed the time zone as 3rd parameter to date_trunc(), but that would return timestamptz, which would result in the next corner case issue with to_char(). You'll want to display local time units, so AT TIME ZONE is better.

fiddle
Old sqlfiddle

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
2

If I understand correctly, you just want to GROUP BY the years and months in your table, for every row that has an action of create_entry:

SELECT
  DATE_PART('YEAR', period_start) as Year,
  DATE_PART('MONTH', period_start) as Month,
  COUNT(*) as Entries
FROM activity_log
WHERE action = 'create_entry'
GROUP BY Year, Month;

SQL Fiddle

Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
  • This is why I love StackOverflow - DATE_PART and date_trunc - I'd never seen them before – PerryW Feb 26 '14 at 23:58
  • Not sure if there's any difference between them. @Erwin would know, he's much more of an expert than I. – Mike Christensen Feb 27 '14 at 00:00
  • Oh. `DATE_PART` returns a number (2013, 2, etc) and `DATE_TRUNC` returns a time stamp, truncated to that part. So, my way would look more like your example output. – Mike Christensen Feb 27 '14 at 00:01
  • Sometimes I hate having to choose a preferred answer... Both excellent, both +1'd. I'm not qualified to say which is the 'best', I suspect it's probably @Erwin's But I'm choosing this one because it's the solution that I've actually used. – PerryW Feb 27 '14 at 23:32
  • @PerryW: I proposed `date_trunc()`, straying from your proposed result format, because aggregating on a single column is cheaper than on two columns. The most interesting bit is the running count, though, and that it only needs a single query level for that. – Erwin Brandstetter Feb 27 '14 at 23:51