Dummy df:
columns = ['id', 'answer', 'is_correct']
data = [['1','hello','1.0'],
['1','hi', '1.0'],
['1','bye', '0.0'],
['2', 'dog', '0.0'],
['2', 'cat', '1.0'],
['2', 'dog', '0.0'],
['3', 'Milan', '1.0'],
['3', 'Paris', '0.0'],
['3', 'Paris', '0.0'],
['3', 'Milannnn', '1.0']]
df = pd.DataFrame(columns=columns, data=data)
I want to create a new df with the following columns:
headers= ['id', 'number_of_different_correct_answers', 'number_of_different_incorrect_answers']
id
should equal id
from the dummy df.
Consequently, I want to retrieve the number of different correct answers (is_correct == 1.0
) for each id
and likewise for is_correct == 0.0
(incorrect answers). With different I mean that within id 2
we have dog twice within is_correct == 0.0
thus it should only count as 1.
Based on the dummy df, the new df would look like this
id number_of_different_correct_answers number_of_different_incorrect_answers
1 2 1
2 1 1
3 2 1