0

So I am trying to create a rolling 30 day window of new and returning users using google bigquery sql. Initially I used the following post as inspiration to get the amount of active users for this 30 day rolling window: Count unique ids in a rolling time frame

Now I would like to segment these users into groups of new and returning users where a new user is defined as user has registered in the last 30 days and a returning user already has an account for longer than 30 days. The problem that I run into is that some users will be considered as both new and returning users in some time frame because on some event dates they were still new users while on other event_dates they became returning users. Let's consider the below example how my initial dataset looks like:

Example image

Find below my query:

-- Make an array of unique users per day
WITH unique_users_per_day AS (
   -- Total Users
   SELECT event_date_dt
   , user_type
   , STRING_AGG(DISTINCT user_pseudo_id) AS unique_users
   FROM `dataset.online_users_per_day`
   WHERE users_onboarded_ind = 1 
   GROUP BY event_date_dt, user_type
 ), 

 -- Now concatenate all these arrays for shifting 30 days periods. Arrays do no longer contain unique users
aggregated_daily_users AS (
   SELECT event_date_dt
   , user_type
   , STRING_AGG(unique_users) OVER(PARTITION BY user_type ORDER BY UNIX_DATE(event_date_dt) 
                              RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) as users
   FROM unique_users_per_day
 ),

 -- Now get back to the unique users per 30 day sliding period
rolling_30day_window_users AS (
  SELECT event_date_dt
  , user_type
  , (SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) Unique30Days
  FROM aggregated_daily_users
)

-- Calculate new_users in the 30 day sliding period by subtracting returning users from total users
SELECT DISTINCT event_date_dt
, user_type
, Unique30Days
FROM rolling_30day_window_users

For my watch date or reporting date 2020-06-02, the user with user_id 'user_1' is categorized as both a new and returning user while effectively it should only be categorized as a returning user. How could I prevent this issue of this user to be double categorized ? Note that I am trying to do this for a 1000+ daily users.

Bobby en Bezemer
  • 103
  • 1
  • 1
  • 4
  • 1
    Can you share your query? – Felipe Hoffa Jun 04 '20 at 23:22
  • Hi @FelipeHoffa, I've added my query – Bobby en Bezemer Jun 05 '20 at 13:43
  • Can you please confirm that the last statement of the SQL query is complete? I'm asking because it's comment mentions subtracting returning users from total users but I do not see that logic in the query. In order to better understand your problem it would also be useful to have an example of the current output and of the desired output. – Daniel Duato Jun 10 '20 at 13:08

0 Answers0