1

I am trying to unstack a column in python but it isn't quite doing what I am expecting. My table looks similar to this:

 Station_id   year   month   Day1   Day2 
 210018       1916     1      4        7
                       2      6      NaN
                       3      2       1
 256700       1917     1      NaN     8
                       2       6      9
                       3       2      0

I want to unstack by month so that all the days from a months are in one row. The two days from month one would then start first followed by the 2 days from month two then the 2 days from month three and so on. I no longer need the month column after this and I have tried deleting it and unstacking but it won't work.

The table would look like this:

 Station_id   year 
 210018       1916         4   7  6  NaN  2  1
 256700       1917        NaN  8  6   9   2  0   

When I try df.unstack(2) right now it returns a result that looks like this:

Station_id   year 
 210018       1916         4   6  2  7  NaN   1
 256700       1917        NaN  6  2  8   9    0

Any help would be much appreciated

Paul H
  • 65,268
  • 20
  • 159
  • 136
Stefano Potter
  • 3,467
  • 10
  • 45
  • 82
  • if you include the column names in the unstacked version, I think you'll see why what pandas is returning is correct. – Paul H Mar 02 '15 at 01:51
  • Yes I don't doubt its correct, I am just unsure how to get to the result I need. – Stefano Potter Mar 02 '15 at 01:56
  • if you're going to be working with pandas a lot and asking questions, I *strongly* recommend that you read through this: http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Paul H Mar 02 '15 at 02:07
  • I will read through it, thanks for the help (it works great!) and the the link to the resource. – Stefano Potter Mar 02 '15 at 02:27

1 Answers1

2

There's nothing wrong with what pandas is doing. You just need to swap around some column levels and then sort them.

take your data and making a reproducible example (python 3) you get this:

from io import StringIO
import pandas

datafile = StringIO("""\
Station_id  year  month  Day1  Day2
     210018  1916      1     4     7
     210018  1916      2     6   NaN
     210018  1916      3     2     1
     256700  1917      1   NaN     8
     256700  1917      2     6     9
     256700  1917      3     2     0
""")

df = pandas.read_table(datafile, sep='\s+', engine='python', 
                       index_col=['Station_id', 'year', 'month'])
print(df.unstack(level='month'))

               Day1       Day2       
month              1  2  3    1   2  3
Station_id year                       
210018     1916    4  6  2    7 NaN  1
256700     1917  NaN  6  2    8   9  0

In your mind, you think it'd be logical that months be on top, but you've given pandas no way to know that this would make sense. So you have to do it yourself:

df = pandas.read_table(datafile, sep='\s+', engine='python', 
                       index_col=['Station_id', 'year', 'month'])
xtab = df.unstack(level='month')
xtab.columns = xtab.columns.swaplevel(0, 1)
xtab = xtab.sort(axis=1)
print(xtab)


month              1         2         3     
                Day1 Day2 Day1 Day2 Day1 Day2
Station_id year                              
210018     1916    4    7    6  NaN    2    1
256700     1917  NaN    8    6    9    2    0
Paul H
  • 65,268
  • 20
  • 159
  • 136