1

I have a pandas dataframe with a datetime index that I'd like to reorient as a grid from a pandas time-series dataframe.

My dataframe looks like this:

DATE        VAL         
2007-06     0.008530
2007-07    -0.067069
2007-08     0.026660
2007-09     0.016237
2007-10     0.025145
2007-11    -0.063666
2007-12    -0.002118
2008-01    -0.059951
2008-02    -0.033422
2008-03     0.008978
2008-04     0.039997
2008-05     0.043563
2008-06    -0.076166
...

And I'd like to reorient, with [year] rows and [month] columns, to look like this:

      Jan       Feb      Mar    ...  Jun      Jul      Aug      Sep      Oct      Nov      Dec  
2007   0         0        0     ... .008530  -.067069 .026660  .016237  .025145  -.06366  -.025145 
2008  -.05995  -.033422 .00897  ... -.076166   ...
...

reshape/stack/unstack methods appear to do a version what I'd like, but as I have a single date index, these don't lend themselves to my dataframe.

Chris
  • 1,401
  • 4
  • 17
  • 28
  • Possibly answer already here: https://stackoverflow.com/questions/25146121/extracting-just-month-and-year-separately-from-pandas-datetime-column – SKPS Jan 28 '20 at 07:13
  • @SathishSanjeevi, not quite...the year and month values are available via `year` and `month` methods applied to the index, but that doesn't really help absent an application – Chris Jan 28 '20 at 07:16

1 Answers1

3

First convert DATE to datetimes and reshape by DataFrame.set_index with Series.unstack, for months use Series.dt.strftime. Last change order of columns names by DataFrame.reindex and remove index namd columns names by DataFrame.rename_axis:

df['DATE'] = pd.to_datetime(df['DATE'])
m = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1 = (df.set_index([df['DATE'].dt.year, df['DATE'].dt.strftime('%b')])['VAL']
         .unstack(fill_value=0)
         .reindex(columns=m)
         .rename_axis(index=None, columns=None))
print (df1)
           Jan       Feb       Mar       Apr       May       Jun       Jul  \
2007  0.000000  0.000000  0.000000  0.000000  0.000000  0.008530 -0.067069   
2008 -0.059951 -0.033422  0.008978  0.039997  0.043563 -0.076166  0.000000   

          Aug       Sep       Oct       Nov       Dec  
2007  0.02666  0.016237  0.025145 -0.063666 -0.002118  
2008  0.00000  0.000000  0.000000  0.000000  0.000000  

Another solution is use DataFrame.pivot and for correct ordering is used ordered categoricals:

df['DATE'] = pd.to_datetime(df['DATE'])
m = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1 = (df.assign(m = pd.Categorical(df['DATE'].dt.strftime('%b'), ordered=True, categories=m),
               y = df['DATE'].dt.year)
         .pivot('y','m','VAL')
         .fillna(0)
         .rename_axis(index=None, columns=None))

print (df1)
           Jan       Feb       Mar       Apr       May       Jun       Jul  \
2007  0.000000  0.000000  0.000000  0.000000  0.000000  0.008530 -0.067069   
2008 -0.059951 -0.033422  0.008978  0.039997  0.043563 -0.076166  0.000000   

          Aug       Sep       Oct       Nov       Dec  
2007  0.02666  0.016237  0.025145 -0.063666 -0.002118  
2008  0.00000  0.000000  0.000000  0.000000  0.000000  

Solution with rename columns:

d = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun',
     7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'}
df1 = (df.set_index([df.index.year, df.index.month]).VAL
         .unstack(fill_value=0)
         .rename(columns=d))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Great. Reindexing (as in the first example) was throwing things off, so I ditched it and added naming differently. Following works: `df = df.set_index([df.index.year, df.index.month]).VAL.unstack(fill_value=0) df.rename( columns={1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun', 7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'}, inplace=True )` – Chris Jan 28 '20 at 07:33
  • @Chris - yop, only remove `inplace=True` – jezrael Jan 28 '20 at 07:34
  • need that part...simply returned to the screen otherwise – Chris Jan 28 '20 at 07:36
  • @Chris - I think it is one line, if second line then is necessary – jezrael Jan 28 '20 at 07:36
  • @Chris - added new paragraph what I mean – jezrael Jan 28 '20 at 07:38
  • yup, that would work. appreciate it. **edit**: actually, don't even need the explicit `VAL` reference since the df consists of only the one column – Chris Jan 28 '20 at 07:42
  • @Chris - yop, agree, if always only 2 columns data should be omited, for geenral solutions with more columns is necessary. – jezrael Jan 28 '20 at 07:44