0

I have a dataframe which is similar to:

grades=pd.DataFrame(columns=["person","course_code","grade"],data=[[1,101,2.0],[2,102,1.0],[3,103,3.0],[2,104,4.0],[1,102,5.0],[3,104,2.5],[2,101,1.0]])

On each row is the grade of a certain student in certain subject.

And want to convert it to another that looks like this:

students=pd.DataFrame(columns=[101,102,103,104],data [[2.0,5.0,"NaN","NaN"],[1.0,1.0,"Nan",4.0],["Nan","Nan",3.0,2.5]])

On each row is a student (codex of the row) with the different grades obtained in every subject (every column is a different subject).

I have tried doing this:

for subj in grades["COURSE_CODE"].unique():
    grades_subj=grades[grades["COURSE_CODE"]==subj]
    grades_subj = grades_subj.set_index("EXPEDIENT_CODE", drop = True)
    for st in grades["EXPEDIENT_CODE"].unique():
        grade_num=grades_subj.loc[st]["GRADE"]
        student.loc[st][subj]=grade_num

But I get:

KeyError: 'the label [304208] is not in the [index]'

I have tried other ways too and get always errors...

Can someone help me, please?

MTT
  • 341
  • 5
  • 17

1 Answers1

1

try:

grades.pivot_table(index='person', columns='course_code', values='grade')

The value argument let you to choose the aggregation column.

In order to answer your comment below, you can always add different levels when indexing. This is simply done by passing a list rather than a single string to index. Note you can do the same in columns. SO, based in the example you provide.

grades.pivot_table(index=['person','school'], columns='course_code', values ='grade')

After this I usually recommend to reset_index() unless you are fluent slicing and indexing with MultiIndex.

Also, if the correspondence is 1 to 1, you could merge both dataframes using the appropiate join.

Here you have all the information about Reshaping and Pivot Tables in Pandas.

Gonzalo Donoso
  • 657
  • 1
  • 6
  • 17
  • yes that's it. You could add a link to the docs, though. – Anderas Mar 29 '18 at 13:16
  • @Anderas thanks for the suggestion! – Gonzalo Donoso Mar 29 '18 at 13:18
  • Thanks! That works perfectly with the example I gave you, but what if I have other columns that I do not want to pivot? (Student information that I want to keep in the same row and like in the previous dataframe) Should I merge afterwards or is it possible to pivot only certain columns? – MTT Mar 29 '18 at 13:30
  • You can choose what column to aggregate when pivoting using the argument `value=`. See my edited answer. If it does not answer your question, please provide a little example of what you want. – Gonzalo Donoso Mar 29 '18 at 14:01
  • Thanks @GonzaloDonoso. Imagine my first dataframe was: grades=pd.DataFrame(columns=["person","school","course_code","grade"],data=[[1,"a",101,2.0],[2,"b",102,1.0],[3,"c",103,3.0],[2,"b",104,4.0],[1,"a",102,5.0],[3,"c",104,2.5],[2,"b",101,1.0]]) And I want to keep the column school without pivoting, that is: students=pd.DataFrame(columns=["school",101,102,103,104],data= [["a",2.0,5.0,"NaN","NaN"],["b",1.0,1.0,"Nan",4.0],["c","Nan","Nan",3.0,2.5]]) – MTT Mar 29 '18 at 15:02
  • Hello, @MTT. Please, see edited answer. I would also recommend to take a look to https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe – Gonzalo Donoso Mar 29 '18 at 15:11