I have two dataframes containing information about student grades and test scores. The first looks like this:
ID Test_Score Class1 Class2 Class3
0 001 85 B- A C+
1 002 78 B NaN B+
2 003 93 A B NaN
...
The second looks like this:
0 1
0 Algebra A
0 Calculus_1 B
0 Calculus_2 C-
1 Algebra C+
1 Trig F
1 Trig C
1 Calculus_1 C-
...
The indices in each dataframe refer to the same student. So the student at index 0 is the same in both dataframes.
What I want to do is create a pivot table from the second dataframe, where the rows correspond to students (i.e. indices), the columns are the math classes, and the values are the highest grade they made in each class (since it's possible for a student to have taken a class more than once). I will then concatenate this to the first dataframe.
I have already written an aggregation function that finds the highest grade.
I tried the following knowing it would fail:
p = pd.pivot_table(u, columns=0, values=1, aggfunc=highest)
Instead of returning a dataframe with a row for each student, I get a single row that contains the highest grade that any student made in the given class:
Algebra Trig Precalculus Calculus_1 Calculus_2
1 A A+ A+ A A
The way I've framed the problem in my mind, I just need to aggregate across the rows which share in index, which I am unsure how to do.
I'm also open to an entirely different approach.
Desired output:
ID Test_Score Class1 Class2 Class3 Algebra Trig ...
0 001 85 B- A C+ A NaN
1 002 78 B NaN B+ C+ C
2 003 93 A B NaN B B-
...
EDIT: Here is the code for my 'highest' function:
def highest(x):
q = 0
z = None
for g in x:
if qpoints(g) > q:
q = qpoints(g)
z = g
return z
where qpoints is the following function that I was already using elsewhere:
def qpoints(x):
qvalue = {'W': 0,
'F': 0,
'D': 1.0,
'D+': 1.33,
'C-': 1.67,
'C': 2.0,
'C+': 2.33,
'B-': 2.67,
'B': 3.0,
'B+': 3.33,
'A-': 3.84,
'A': 4.0,
'A+': 4.0}
return qvalue[x]