2

I'm trying to parse a logging table in PostgreSQL 9.5. Let's imagine I'm logging SMS sent from all the phones belonging to my company. For each record I have a timestamp and the phone ID.
I want to display how many SMS are sent by week but only for the phones that send SMS each week of the year.

My table is as following:

╔════════════╦══════════╗
║ event_date ║ phone_id ║
╠════════════╬══════════╣
║ 2016-01-05 ║    1     ║
║ 2016-01-06 ║    2     ║
║ 2016-01-13 ║    1     ║
║ 2016-01-14 ║    1     ║
║ 2016-01-14 ║    3     ║
║ 2016-01-20 ║    1     ║
║ 2016-01-21 ║    1     ║
║ 2016-01-22 ║    2     ║
╚════════════╩══════════╝

And I would like the following display

╔══════════════╦══════════╦══════════════╗
║ week_of_year ║ phone_id ║ count_events ║
╠══════════════╬══════════╬══════════════╣
║  2016-01-04  ║    1     ║       1      ║
║  2016-01-11  ║    1     ║       2      ║
║  2016-01-18  ║    1     ║       2      ║
╚══════════════╩══════════╩══════════════╝

Only phone_id 1 is displayed because this is the only ID with events in each week of the year.

Right now, I can query to group by week_of_year and phone_IDs. I have the following result:

╔══════════════╦══════════╦══════════════╗
║ week_of_year ║ phone_id ║ count_events ║
╠══════════════╬══════════╬══════════════╣
║  2016-01-04  ║    1     ║       1      ║
║  2016-01-04  ║    2     ║       1      ║
║  2016-01-11  ║    1     ║       2      ║
║  2016-01-11  ║    3     ║       1      ║
║  2016-01-18  ║    1     ║       2      ║
║  2016-01-18  ║    2     ║       1      ║
╚══════════════╩══════════╩══════════════╝

How can I filter in order to only keep phone_ids occurring for each week of year? I tried various subqueries but I must acknowledge I'm stuck. :-)

About the definition of week_of_year: as I want to consolidate data per week, I'm using in my select: date_trunc('week', event_date)::date as interval. And then I group by interval to have the number of SMS per phone_id per week.

About the date range, I just want this starting in 2016, I'm using a where condition in my query to ignore everything before: WHERE event_date > '2016-01-01'

I saw the request to create a SQL Fiddle but I have issues to do so, will do it if I'm not lucky enough to have a good hint to solve this.

Created a quick SQL Fiddle, hope it would useful.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
Tigrann
  • 23
  • 4

4 Answers4

0

Below assumes that your table represents a full year. You didn't specify that.

To find all phones that send SMSs every week, you can do something like

select phone_id, count(distinct extract(week from event_date)) as cnt
from table
having cnt >= 51

Note, I use 51, but the notion of a week in a year is a bit fuzzy, they actually have 52 or 53 (partila) weeks. But 51 should be fine.

Anyway, And then you simply do

select phone_id, date_trunc('week', event_date), count(*) 
from table
where phone_id in (.. query above ..)
group by 1, 2

Would be great if you provided sample data in SQLFiddle

Marcin Zukowski
  • 4,281
  • 1
  • 19
  • 28
0

The HAVING clause will allow you to filter off any phones which did not have activity each week in the calendar year. In my answer below I use a common table expression (CTE), since I actually perform two queries of a similar nature. The first query on X reproduces the result you already have, while the subquery finds all phones which had activity in each week of the year (i.e. 52 weeks). I assume the year you want to query is 2015, but you are free to change this as you like.

WITH X AS (
    SELECT DATE_TRUNC('week', event_date)::date AS week_of_year,
        phone_id, COUNT(*) AS count_events
    FROM messages
    GROUP BY week_of_year, phone_id
    WHERE EXTRACT(YEAR FROM event_date)::text = '2016'
)
SELECT x1.week_of_year, x1.phone_id, x1.count_events
FROM X x1 INNER JOIN
(
    SELECT phone_id, COUNT(*)
    FROM X
    GROUP BY phone_id
    HAVING COUNT(*) =
    (
        SELECT COUNT(DISTINCT DATE_TRUNC('week', event_date)::date)
        FROM messages
        WHERE EXTRACT(YEAR FROM event_date)::text = '2016'
    )
) x2
    ON x1.phone_id = x2.phone_id
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • @ErwinBrandstetter I updated my answer to use a subquery to dynamically obtain the number of unique weeks in a given year (2016) in this case. – Tim Biegeleisen Mar 31 '16 at 03:55
0

Your concept of year seems very fuzzy. Let me instead assume that you mean for a period of time over the range of your data.

with w as (
      select date_trunc('week', event_date) as wk, phone_id, count(*) as cnt
      from messages
      group by 1, 2
     ),
     ww as (
      select w.*,
             min(wk) over () as min_wk,
             max(wk) over () as max_wk,
             count(*) over (partition by phone_id) as numweeks
      from w
     )
select ww.wk, ww.phone_id, ww.cnt
from ww
where (max_wk - min_wk) / 7 = cnt - 1;

The first CTE just aggregates the data by week and phone id. The second CTE calculates the first and last week in the data (these could be replaced with constants), as well as the number of weeks for a given phone.

Finally, the where clause makes sure that the number of weeks spans the period of time.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

I want to display how many SMS are sent by week but only for the phones that send SMS each week of the year.

The tricky part is the exact definition of "year" and "week". By default I would assume ISO definitions.

Weeks of the year are defined like this (quoting the Postgres manual):

Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January

It follows logically that the 28th of December falls into the last week of each year. And it would make sense to assume the current time as upper border for the current year - or else you'll have to define it explicitly.

Either way, the first day of the first week can still be part of the previous year, or vice versa at the end of the year. For example, the ISO weeks of 2015 include parts of 2014 and 2016. And the number of weeks is 52 most of the time, but not always - which invalidates at least one of the queries provided so far. Consider this demo:

SELECT extract(year FROM jan1)::text              AS year
     , date_trunc('week', jan4)::date             AS ts_min  -- incl. lower bound
     , date_trunc('week', dec28_or_now)::date + 7 AS ts_max  -- excl. upper bound
     , extract(week FROM  dec28_or_now)           AS weeks
FROM  (SELECT jan1, jan1 + interval '3 days' AS jan4
            , least(jan1 + interval '1 year - 4 days', now()) AS dec28_or_now
       FROM   generate_series(date '2010-01-01'
                            , date '2016-01-01'
                            , interval '1 year') jan1
      ) sub;

Result:

 year |   ts_min   |   ts_max   | weeks
------+------------+------------+-------
 2010 | 2010-01-04 | 2011-01-03 |    52
 2011 | 2011-01-03 | 2012-01-02 |    52
 2012 | 2012-01-02 | 2012-12-31 |    52
 2013 | 2012-12-31 | 2013-12-30 |    52
 2014 | 2013-12-30 | 2014-12-29 |    52
 2015 | 2014-12-29 | 2016-01-04 |    53
 2016 | 2016-01-04 | 2016-04-04 |    13  -- weeks so far

You did not define the exact data type of event_date. It can make a difference. Be aware that the exact definition of "year" and "week" depend on the time zone you are in. It can still be Sunday, Dec 31 in New York, while it's already Monday, Jan 1 in Berlin.

Postgres assumes the time zone setting of your current session when extracting year or week numbers or using date_trunc(). If your time zone can vary at all be sure to use the data type timestamp with time zone for event_date to exclude one more entry point for mistakes. Details:

My query below works either way. I encapsulated the logic from above in a similar fashion in a CTE of the query, so you only provide the year of interest once:

WITH year_data AS (
   SELECT date_trunc('week', jan4)                             AS ts_min  -- incl. bound
        , date_trunc('week', dec28_or_now) + interval '1 week' AS ts_max  -- excl. bound
        , extract(week FROM  dec28_or_now)                     AS weeks
   FROM  (SELECT jan1 + interval '3 days' AS jan4
               , least(jan1 + interval '1 year - 4 days', now()) AS dec28_or_now
          FROM  (SELECT date '2016-01-01' AS jan1) t  -- provide Jan 1 of the year here!
         ) sub
   )
SELECT week_start, phone_id, count_events
FROM  (
   SELECT t.phone_id, date_trunc('week', t.event_date) AS week_start
        , count(*) AS count_events
        , count(*) OVER (PARTITION BY t.phone_id) AS weeks
   FROM   tbl t
   JOIN   year_data y ON t.event_date >= y.ts_min
                     AND t.event_date <  y.ts_max
   GROUP  BY 1, 2
   ) sub
WHERE  sub.weeks = (SELECT weeks FROM year_data);

For the technique to get the count of sms and count of weeks with sms in the same query level:

Also important: this query uses sargable predicates and can use indexes on event_date (as opposed to all other queries provided so far).

Ideally, you have an index on (event_date, phone_id) to allow index-only scans for best performance.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228