2

Let's say you have a user table that has at least the date the user signed up and an id.

Now let's say you have a separate table that tracks an action like a payment that can happen at any point in the user's lifetime. (Say like an in-app purchase.) In that table we track the userId, payment date, and an id for the payment.

So we have something that looks like this to get our schema set up:

CREATE TABLE users (
  UserId INT,
  AddedDate DATETIME
);

CREATE TABLE payments (
  PaymentId INT,
  UserId INT,
  PaymentDate Datetime
);

Now you want a table that shows weekly cohorts. A table that looks something like this:

Week       size w1  w2  w3  w4  w5  w6  w7
2017-08-28  1   0   0   0   1   0   0   0
2017-09-04  3   1   0   2   0   1   1   2
2017-09-11  2   0   0   1   0   0   0   1
2017-09-18  6   3   1   4   3   1   1   2
2017-09-25  2   1   1   1   0   1   2   0
2017-10-02  7   5   2   3   4   3   1   0
2017-10-09  7   4   5   1   2   5   0   0
2017-10-16  2   1   2   1   1   0   0   0
2017-10-23  7   5   4   4   3   0   0   0
2017-10-30  8   8   7   0   0   0   0   0
2017-11-06  5   5   2   0   0   0   0   0

So the first column has the week, the second has number of people that signed up that week. Say we look at week 2017-09-18. 6 people signed up that week. The 3 under the w1 column means that 3 people out of that 6 made a purchase the week they signed up. The 1 under w2 means 1 person out of that 6 made a purchase the second week they were signed up, and so on.

What query would I use to get a table that looks like that?

CovertIII
  • 1,053
  • 1
  • 11
  • 18

2 Answers2

4

This query is modified from the one I wrote here: Cohort analysis in SQL

Here's the final query:

SELECT
  STR_TO_DATE(CONCAT(tb.cohort, ' Monday'), '%X-%V %W') as date,
  size,
  w1,
  w2,
  w3,
  w4,
  w5,
  w6,
  w7
FROM (
  SELECT u.cohort, 
    IFNULL(SUM(s.Offset = 0), 0) w1,
    IFNULL(SUM(s.Offset = 1), 0) w2,
    IFNULL(SUM(s.Offset = 2), 0) w3,
    IFNULL(SUM(s.Offset = 3), 0) w4,
    IFNULL(SUM(s.Offset = 4), 0) w5,
    IFNULL(SUM(s.Offset = 5), 0) w6,
    IFNULL(SUM(s.Offset = 6), 0) w7
  FROM (
   SELECT
      UserId,
      DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
    FROM users
  ) as u
  LEFT JOIN (
      SELECT DISTINCT
      payments.UserId,
      FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
      FROM payments
      LEFT JOIN users ON (users.UserId = payments.UserId)
  ) as s ON s.UserId = u.UserId
  GROUP BY u.cohort
) as tb
LEFT JOIN (
  SELECT DATE_FORMAT(AddedDate, "%Y-%u") dt, COUNT(*) size FROM users GROUP BY dt
) size ON tb.cohort = size.dt

So the core of this is we grab the users and the date they signed up and format the date by year-week number, since we are doing a weekly cohort.

SELECT
  UserId,
  DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
FROM users

Since we want to group by the cohort we have to put this in a subquery in the FROM part of the query.

Then we want join the payment information on the users.

SELECT DISTINCT
  payments.UserId,
  FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
  FROM payments
  LEFT JOIN users ON (users.UserId = payments.UserId)

This will get unique weekly payment events per user by the numbers of weeks they have been a user. We use distinct because if a user made 2 purchase in one week, we don't want to count that as two users.

We don't just use the payments table, because some users may sign up and not have payments. So we select from the users table and join on the payments table.

You then group by the week - u.cohort. Then you aggregate on the week numbers to find out how many people made payments the weeks after they signed up.

The version of mysql I used had sql_mode set to only_full_group_by. So to get the cohort size I put the bulk of the query in subquery so I could join on the users to get the size of the cohort.

Further considerations:

Filter by weeks is simple. tb.cohort > start date and tb.cohort < end date where start and end date are formatted with "%Y-%u". To make the query more efficient you'll probably want to filter out payment events that don't fall within the date range as well so you're not joining on data you don't need.

You may want to consider using a calender table to cover cases where there are no user sign ups during the week.

Here's a fiddle with everything working: http://sqlfiddle.com/#!9/172dbe/1

CovertIII
  • 1,053
  • 1
  • 11
  • 18
0

To sort by months, you need to transfer the month to Offset

MONTH(payments.PaymentDate) AS Offset

Also to add a date selection with months

DATE_FORMAT(AddedDate, "%Y-%m") AS cohort_month

And add

ORDER BY tb.cohort_month ASC