0

How would I transform df1 into df2, while keeping the time series in order? I believe it has something to do with unstacking but I'm not able to figure out the exact code. The key is that for each name is should go in chronological order by year. Thanks.

>>> import pandas as pd
>>> studentData = {
...     'year' : [2011,2012,2011,2012,2011,2012,2011,2012],
...     'name' : ['jack', 'jack','jack','jack','john','john','john','john'],
...     'subject' : ['math','math','science','science','history','history','science','science'],
...     'grade' : ['A', 'A','C','B+', 'B+','B','A','N/A']
... }
>>> 
>>> df1 = pd.DataFrame(studentData)
>>> df1
   year  name  subject grade
0  2011  jack     math     A
1  2012  jack     math     A
2  2011  jack  science     C
3  2012  jack  science    B+
4  2011  john  history    B+
5  2012  john  history     B
6  2011  john  science     A
7  2012  john  science   N/A
>>> 
>>> 
>>> studentData2 = {
...     'year' : [2011,2012,2011,2012],
...     'name' : ['jack','jack', 'john','john'],
...     'math' : ['A','A','N/A','N/A'],
...     'science':['C','B+','A','N/A'],
...     'history':['B+','N/A','B+','B']
... }
>>> 
>>> df2 = pd.DataFrame(studentData2)
>>> df2
   year  name math science history
0  2011  jack    A       C      B+
1  2012  jack    A      B+     N/A
2  2011  john  N/A       A      B+
3  2012  john  N/A     N/A       B
>>> 
AI92
  • 387
  • 1
  • 8
  • and for your case probably `df1.pivot(index='name', columns='subject', values='grade')` see question 10 in the link above – Ben.T May 20 '20 at 20:41
  • 1
    No I'm getting this error ValueError: Index contains duplicate entries, cannot reshape. I should also change the DataFrame a bit so it reflects multilevel indexing. Because I'm pivoting on not just Jack, but Jack from California vs. Jack from Ohio – AI92 May 20 '20 at 20:50
  • The error you get is explained in the first question of the link above, that may help you – Ben.T May 20 '20 at 20:56
  • With the edit, it is more like question 8 of the link, try `df1.set_index(['year', 'name', 'subject'])['grade'].unstack()` – Ben.T May 20 '20 at 21:19

0 Answers0