So I am trying to create a rolling 30 day window of new and returning users using google bigquery sql. Initially I used the following post as inspiration to get the amount of active users for this 30 day rolling window: Count unique ids in a rolling time frame
Now I would like to segment these users into groups of new and returning users where a new user is defined as user has registered in the last 30 days and a returning user already has an account for longer than 30 days. The problem that I run into is that some users will be considered as both new and returning users in some time frame because on some event dates they were still new users while on other event_dates they became returning users. Let's consider the below example how my initial dataset looks like:
Find below my query:
-- Make an array of unique users per day
WITH unique_users_per_day AS (
-- Total Users
SELECT event_date_dt
, user_type
, STRING_AGG(DISTINCT user_pseudo_id) AS unique_users
FROM `dataset.online_users_per_day`
WHERE users_onboarded_ind = 1
GROUP BY event_date_dt, user_type
),
-- Now concatenate all these arrays for shifting 30 days periods. Arrays do no longer contain unique users
aggregated_daily_users AS (
SELECT event_date_dt
, user_type
, STRING_AGG(unique_users) OVER(PARTITION BY user_type ORDER BY UNIX_DATE(event_date_dt)
RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) as users
FROM unique_users_per_day
),
-- Now get back to the unique users per 30 day sliding period
rolling_30day_window_users AS (
SELECT event_date_dt
, user_type
, (SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) Unique30Days
FROM aggregated_daily_users
)
-- Calculate new_users in the 30 day sliding period by subtracting returning users from total users
SELECT DISTINCT event_date_dt
, user_type
, Unique30Days
FROM rolling_30day_window_users
For my watch date or reporting date 2020-06-02, the user with user_id 'user_1' is categorized as both a new and returning user while effectively it should only be categorized as a returning user. How could I prevent this issue of this user to be double categorized ? Note that I am trying to do this for a 1000+ daily users.