2

I have a pandas dataframe like below:

id val  date    period1 period2  period3  
 1  4   05/03     1         2         3  
 2  6   06/03     4         5         6  
 3  2   07/03     7         8         9  
 4  9   08/03     5         7         1  

I want to transform this from wide to long based on date and period like below:

id val  date  period  data 
 1  4   05/03  06/03    1  
               07/03    2 
               08/03    3  
 2  6   06/03  07/03    4 
               08/03    5   
               09/03    6  
 3  2   07/03  08/03    7     
               09/03    8     
               10/03    9  
 4  9   08/03  09/03    5      
               10/03    7     
               11/03    1  

That is the value of period will be the value of date+1month(year may also change) and so on, with its corresponding value coming in data column, while the rest of the dataframe remains the same.

How can I achieve this?

Ank
  • 1,864
  • 4
  • 31
  • 51

2 Answers2

2

First reshape values by DataFrame.set_index with DataFrame.stack.

Then convert column date to datetimes and to month periods by Series.dt.to_period, for add counter per groups by GroupBy.cumcount and change format by Series.dt.strftime:

df = (df.set_index(['id','val','date'])
        .rename_axis('period', axis=1)
        .stack()
        .reset_index(name='data'))
s = pd.to_datetime(df['date'], format='%m/%y').dt.to_period('m')
df['period'] = df.groupby(['id','val','date']).cumcount().add(s + 1).dt.strftime('%m/%y')
print (df)

    id  val   date period  data
0    1    4  05/03  06/03     1
1    1    4  05/03  07/03     2
2    1    4  05/03  08/03     3
3    2    6  06/03  07/03     4
4    2    6  06/03  08/03     5
5    2    6  06/03  09/03     6
6    3    2  07/03  08/03     7
7    3    2  07/03  09/03     8
8    3    2  07/03  10/03     9
9    4    9  08/03  09/03     5
10   4    9  08/03  10/03     7
11   4    9  08/03  11/03     1

Last if need empty values to first 3 columns - is it possible by DataFrame.duplicated, but get mixed values - numeric with strings, so numeric functions should failed:

df.loc[df.duplicated(['id','val','date']), ['id','val','date']] = ''
print (df)
   id val   date period  data
0   1   4  05/03  06/03     1
1                 07/03     2
2                 08/03     3
3   2   6  06/03  07/03     4
4                 08/03     5
5                 09/03     6
6   3   2  07/03  08/03     7
7                 09/03     8
8                 10/03     9
9   4   9  08/03  09/03     5
10                10/03     7
11                11/03     1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Solution using wide_to_long

df1 = (pd.wide_to_long(df, stubnames='period', j='p', i=['id', 'val', 'date'])
         .rename(columns={'period': 'data'}).reset_index())

df1['p'] = ((pd.to_datetime(df1.date, format='%m/%y').dt.to_period('M') 
            + df1.p).dt.strftime('%m/%y'))

df1.rename(columns={'p': 'period'})


Out[193]:
    id  val   date period  data
0    1    4  05/03  06/03     1
1    1    4  05/03  07/03     2
2    1    4  05/03  08/03     3
3    2    6  06/03  07/03     4
4    2    6  06/03  08/03     5
5    2    6  06/03  09/03     6
6    3    2  07/03  08/03     7
7    3    2  07/03  09/03     8
8    3    2  07/03  10/03     9
9    4    9  08/03  09/03     5
10   4    9  08/03  10/03     7
11   4    9  08/03  11/03     1
Andy L.
  • 24,909
  • 4
  • 17
  • 29