I want to find the solution for this,
Provided a table with user_id and the dates they visited the platform, find the top 100 users with the longest continuous streak of visiting the platform as of yesterday.
I found these links that explain how to do this for one user. However, I am not sure how to do it for multiple users.
One naive might be to get all unique users and using a for loop and above answer, get the users with a maximum continuous visiting streak. However, I am interested in a vectorised way if possible.
If needed, these are the codes I used,
date_series = pd.Series(np.random.randint(0,10, 400), index=pd.to_datetime(np.random.randint(0,20, 400)*1e9*24*3600), name="uid")
df = date_series.reset_index().rename({"index":"date_val"}, axis=1).drop_duplicates().reset_index(drop=True)
For a given user id (say uid =1), I can use the following to find the max streak,
sub_df = df[df.uid==1].sort_values("date_val")
(sub_df.date_val+pd.Timedelta(days=1) != sub_df.date_val.shift(-1)).cumsum().value_counts().max()
But I don't understand how to do a similar thing for all users in the original dataframe (df) with a vectorized (not for loop) approach.