2

My data is currently in a long format. Below is a sample:

     ID    Date             Performance Location     Attrition     
       1   2001-01-31           2.380       NYC            0
       1   2001-02-28           2.440       NYC            0
       1   2001-03-31           2.460       RVA            1
       2   2001-01-31           2.780       NYC            0
       2   2001-02-28           2.800       NYC            0
       2   2001-03-31           2.700       NYC            0
       2   2001-04-30           2.300       NYC            1

I would like to reshape it into a wide format by "ID" and "DATE" like so:

       ID    Date(Start)   Date(End)  Performance Attrition  Horizon   
       1   2001-01-31      2001-02-28     2.380   0          1
       1   2001-01-31      2001-03-31     2.380   1          2
       1   2001-02-28      2001-03-31     2.440   1          1
       2   2001-01-31      2001-02-28     2.380   0          1
       2   2001-01-31      2001-03-31     2.780   0          2
       2   2001-01-31      2001-04-30     2.780   1          3
       2   2001-02-28      2001-03-31     2.800   0          1
       2   2001-02-28      2001-04-30     2.800   1          2
       2   2001-03-31      2001-04-30     2.700   1          1

I tried the solution posted here Pandas long to wide reshape, by two variables and have this:

df['idx'] = (df.groupby(['ID', 'DATE']).cumcount()) +1)

df1 = (df.pivot_table(index=['ID', 'DATE'], 
                  columns=['idx'], 
                  values=['Performance'], 
                  aggfunc='first'))

But this solution is not able to repeat the rows for the same person across each month. I also want to add a column to count for the number of changes within a month( Horizon).

The solution I can think of is too add another date:

df['Date(End)'] = df['Date'].shift(1) and then
df['idx'] = (df.groupby(['ID', 'DATE','Date(End)']).cumcount()) +1)

df1 = (df.pivot_table(index=['ID', 'DATE','Date(End)'],columns=['idx'], values=['Performance'], aggfunc='first'))

If you can help me figure out what I am doing wrong here or if there is a better solution to achieving this data transformation.

Thank you in Advance.

0 Answers0