I need to analyse the final 60 days up to the last date each user was active.
My dataframe contains the dates ('CalendarDate') that each user ('DataSourceId') was active ('Activity' an integer) - one row per date. I have grouped the dataframe by DataSourceId so I have dates in the columns and I have grabbed the last day each user was active 'max_date':
df['max_date'] = df.groupby('DataSourceId')['CalendarDate'].transform('max')
The data look something like this although 'CalendarDate' and 'max_date' are actually datetime64[ns]
format (the Activity values are float64
):
ID Jan1 Jan2 Jan3 Jan4 Jan5... max_date
1 8 15 10 Jan5
2 2 13 Jan3
3 6 11 Jan2
Now, I want to realign the columns from calendar dates to "last x days" for each row. Like this:
ID Last Last-1 Last-2 Last-3 ... Last-x
1 10 15 8
2 13 2
3 11 6
I have not been able to find any examples of similar transformations and am really stranded here.
EDITED: After adapting jezrael's solution I noticed it failed on occassion.
I think the problem is related to this code in jezrael's solution: r = data_wide.bfill().isna().sum(axis=1).values
Example: This data fails (and r = [0 3]
):
CalendarDate 2017-07-02 2017-07-03 2017-07-06 2017-07-07 2017-07-08 2017-07-09
DataSourceId
1000648 NaN 188.37 178.37 NaN 128.37 18.37
1004507 51.19 NaN 52.19 53.19 NaN NaN
Specifically the realigned dataframe looks like this:
Last-0 Last-1 Last-2 Last-3 Last-4 Last-5
DataSourceId
1000648 18.37 128.37 NaN 178.37 188.37 NaN
1004507 52.19 NaN 51.19 NaN NaN 53.19
If I change the order in the dataframe by changing ID 1000648 to 1100648 (so that it becomes the second row) this is the result (r = [0 2]
):
Last-0 Last-1 Last-2 Last-3 Last-4 Last-5
DataSourceId
1004507 NaN NaN 53.19 52.19 NaN 51.19
1100648 NaN 178.37 188.37 NaN 18.37 128.37