My data is currently in a long format. Below is a sample:
ID Date Performance Location Attrition
1 2001-01-31 2.380 NYC 0
1 2001-02-28 2.440 NYC 0
1 2001-03-31 2.460 RVA 1
2 2001-01-31 2.780 NYC 0
2 2001-02-28 2.800 NYC 0
2 2001-03-31 2.700 NYC 0
2 2001-04-30 2.300 NYC 1
I would like to reshape it into a wide format by "ID" and "DATE" like so:
ID Date(Start) Date(End) Performance Attrition Horizon
1 2001-01-31 2001-02-28 2.380 0 1
1 2001-01-31 2001-03-31 2.380 1 2
1 2001-02-28 2001-03-31 2.440 1 1
2 2001-01-31 2001-02-28 2.380 0 1
2 2001-01-31 2001-03-31 2.780 0 2
2 2001-01-31 2001-04-30 2.780 1 3
2 2001-02-28 2001-03-31 2.800 0 1
2 2001-02-28 2001-04-30 2.800 1 2
2 2001-03-31 2001-04-30 2.700 1 1
I tried the solution posted here Pandas long to wide reshape, by two variables and have this:
df['idx'] = (df.groupby(['ID', 'DATE']).cumcount()) +1)
df1 = (df.pivot_table(index=['ID', 'DATE'],
columns=['idx'],
values=['Performance'],
aggfunc='first'))
But this solution is not able to repeat the rows for the same person across each month. I also want to add a column to count for the number of changes within a month( Horizon).
The solution I can think of is too add another date:
df['Date(End)'] = df['Date'].shift(1)
and then
df['idx'] = (df.groupby(['ID', 'DATE','Date(End)']).cumcount()) +1)
df1 = (df.pivot_table(index=['ID', 'DATE','Date(End)'],columns=['idx'], values=['Performance'], aggfunc='first'))
If you can help me figure out what I am doing wrong here or if there is a better solution to achieving this data transformation.
Thank you in Advance.