0

Is there a way to get weekly data on active subscriptions if I have the time periods when each subscription was active? I have a table in BigQuery with a list of subscriptions:

+-----------------+---------+---------------+--------------+
| subscription_id | user_id | subscribed_at | cancelled_at |
+-----------------+---------+---------------+--------------+
|               1 |       2 | 2017-01-05    | 2017-06-03   |
|               2 |       3 | 2017-01-07    | 2017-09-15   |
|               3 |       4 | 2017-01-09    | NULL         |
|               4 |       1 | 2017-01-11    | 2017-05-27   |
|               5 |       3 | 2017-01-15    | NULL         |
+-----------------+---------+---------------+--------------+

I need to get a record for each unique subscription_id + active_week combination. Something like this:

+-----------------+---------+---------------+--------------+-------------+
| subscription_id | user_id | subscribed_at | cancelled_at | active_week |
+-----------------+---------+---------------+--------------+-------------+
|               1 |       2 | 2017-01-05    | 2017-06-03   |      201701 |
|               2 |       3 | 2017-01-07    | 2017-09-15   |      201701 |
|               1 |       2 | 2017-01-05    | 2017-06-03   |      201702 |
|               2 |       3 | 2017-01-07    | 2017-09-15   |      201702 |
|               3 |       4 | 2017-01-09    | NULL         |      201702 |
|               4 |       1 | 2017-01-11    | 2017-05-27   |      201702 |
|               1 |       2 | 2017-01-05    | 2017-06-03   |      201703 |
|               2 |       3 | 2017-01-07    | 2017-09-15   |      201703 |
|               3 |       4 | 2017-01-09    | NULL         |      201703 |
|               4 |       1 | 2017-01-11    | 2017-05-27   |      201703 |
|               5 |       3 | 2017-01-15    | NULL         |      201703 |
|             ... |      ...| ...           |...           |         ... |
+-----------------+---------+---------------+--------------+-------------+

I tried to go from this, but no luck.

SELECT
  SPLIT(RPAD('', 1 + DATEDIFF(sub.ended_date, sub.started_date), '.'),'') AS weeks,
  sub.subscription_Id,
  sub.customer_id
FROM (
  SELECT
    subscribed_at AS started_date,
    CASE
      WHEN cancelled_at IS NULL THEN TIMESTAMP(CURRENT_DATE())
      ELSE TIMESTAMP(cancelled_at)
    END AS ended_date,
    subscription_id,
    customer_id
  FROM
    [subscriptions]) AS sub

Your help is very much appreciated!

Best, Denys

syned
  • 101
  • 1
  • 4
  • 1
    Welcome to Stack Overflow! Pure code-writing requests are off-topic on Stack Overflow -- we expect questions here to relate to specific programming problems -- but we will happily help you write it yourself! Tell us [what you've tried](https://stackoverflow.com/help/how-to-ask), and where you are stuck. This will also help us answer your question better. – WhatsThePoint Nov 29 '17 at 10:40
  • This should help you get started: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#extract – Graham Polley Nov 29 '17 at 12:02
  • just show what you tried and what issue you have, so we can pick it from there :o) – Mikhail Berlyant Nov 30 '17 at 14:23
  • @MikhailBerlyant I tried using a query from [here](https://stackoverflow.com/questions/38306016/populating-a-table-with-all-dates-in-a-given-range-in-google-bigquery) and modifying it to take data from my table, but no luck. – syned Nov 30 '17 at 14:55
  • So just show it! – Mikhail Berlyant Nov 30 '17 at 14:57
  • @MikhailBerlyant added the code I tried in the question. Thanks. – syned Nov 30 '17 at 17:27

1 Answers1

0

Below is for BigQuery Standard SQL

#standardSQL
WITH temp AS (
  SELECT subscription_id, user_id, 
    PARSE_DATE('%Y-%m-%d', subscribed_at) subscribed_at, 
    PARSE_DATE('%Y-%m-%d', cancelled_at) cancelled_at
  FROM `project.dataset.subscriptions`
), weeks AS (
  SELECT 
    wk week_start, 
    DATE_ADD(wk, INTERVAL 6 DAY) week_end
  FROM (
    SELECT GENERATE_DATE_ARRAY(
        DATE_TRUNC(MIN(subscribed_at), WEEK), 
        CURRENT_DATE(), INTERVAL 1 WEEK) weeks
    FROM temp
  ), UNNEST(weeks) wk
)
SELECT subscription_id, user_id, subscribed_at, cancelled_at, week_start, week_end
FROM weeks
JOIN temp
ON subscribed_at <= week_end
AND IFNULL(cancelled_at, CURRENT_DATE()) > week_start

Please note: I assume your subscribed_at and cancelled_at fields are of STRING data type - that is why there is an extra temp subquery takes care of casting them as DATE fields. If respective fields in your original table are already of DATE data type - you should remove use of temp and just use `project.dataset.subscriptions` instead.

You can test / play with above query using dummy data from your question as below

#standardSQL
WITH `project.dataset.subscriptions` AS (
  SELECT 1 subscription_id, 2 user_id, '2017-01-05' subscribed_at, '2017-06-03' cancelled_at UNION ALL
  SELECT 2, 3, '2017-01-07', '2017-09-15' UNION ALL
  SELECT 3, 4, '2017-01-09', NULL UNION ALL
  SELECT 4, 1, '2017-01-11', '2017-05-27' UNION ALL
  SELECT 5, 3, '2017-01-15', NULL
), temp AS (
  SELECT subscription_id, user_id, 
    PARSE_DATE('%Y-%m-%d', subscribed_at) subscribed_at, 
    PARSE_DATE('%Y-%m-%d', cancelled_at) cancelled_at
  FROM `project.dataset.subscriptions`
), weeks AS (
  SELECT 
    wk week_start, 
    DATE_ADD(wk, INTERVAL 6 DAY) week_end
  FROM (
    SELECT GENERATE_DATE_ARRAY(
        DATE_TRUNC(MIN(subscribed_at), WEEK), 
        CURRENT_DATE(), INTERVAL 1 WEEK) weeks
    FROM temp
  ), UNNEST(weeks) wk
)
SELECT subscription_id, user_id, subscribed_at, cancelled_at, week_start, week_end
FROM weeks
JOIN temp
ON subscribed_at <= week_end
AND IFNULL(cancelled_at, CURRENT_DATE()) > week_start
ORDER BY week_start, subscription_id

Also note: in above query - week starts at Sunday
If you want to count week starting from Monday - you should adjust a little week_start and week_end in weeks CTE - like below

    DATE_ADD(wk, INTERVAL 1 DAY) week_start, 
    DATE_ADD(wk, INTERVAL 7 DAY) week_end
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230