1

I have been given a large dataset that has data arranged like so:

location  cost    year
1         23.15    1986
1         23.91    1988
1         23.31    1989
1         23.91    1993
1         22.98    1994
1         23.99    1995
1         23.71    1997
1         23.01    1999
2         23.21    2000
2         24.28    2004
2         24.4     2005

I'd like to rearrange this so it is in the form:

location    1985    1986    1987   1988
1           20.00   20.00   20.0    20.0
2           20.00   20.00   20.0    20.0
3           20.00   20.00   20.0    20.0
4           20.00   20.00   20.0    20.0
5           20.00   20.00   20.0    20.0

(note: ignore that the new costs are all 20.0. My goal is to turn the values within the year column into headers, so that each location is only listed once, with the cost for a specific year located in that column.)

Is there a straightforward way to do this? I've looked into groupy and transpose but have not been able to produce anything close to what I'd like.

Thank you in advance for any pointers you can provide.

Housefrog
  • 13
  • 5
  • Why are all the columns `20.0[0]`? – erip Dec 21 '15 at 17:17
  • Just for the sake of example position; the values aren't real. – Housefrog Dec 21 '15 at 17:18
  • 1
    It's a little confusing. Might be worth showing real values you'd expect. – erip Dec 21 '15 at 17:19
  • see http://stackoverflow.com/questions/22127569/opposite-of-melt-in-python-pandas – behzad.nouri Dec 21 '15 at 17:20
  • It might also be good to see your current code. – erip Dec 21 '15 at 17:21
  • I've clarified the example to better indicate the values are not real. If I could get the data in the form that I'd like with real values, I wouldn't be stuck on this problem. – Housefrog Dec 21 '15 at 17:25
  • Just to add to @erip's comment, if you used "real" data, we'd be able to see how you expect to handle null/mising/multiple values for a cell (doesn't have to be real data, but data that matches your example list) – Basic Dec 21 '15 at 17:26

1 Answers1

2

You need to use pivot_table:

pd.pivot_table(df, index='location', columns='year', values='cost', fill_value=0)

With your sample:

#Out[11]: 
#year       1986   1988   1989   1993   1994   1995   1997   1999   2000  \
#location                                                                  
#1         23.15  23.91  23.31  23.91  22.98  23.99  23.71  23.01   0.00   
#2          0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00  23.21   

#year       2004  2005  
#location               
#1          0.00   0.0  
#2         24.28  24.4
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87