I have the following dataframe:
df = pd.DataFrame({'student': list('AAABBBCCCC'),
'city': ['LA', 'LA', 'NY', 'DC', 'NY', 'NY', 'SF', 'SF', 'LA', 'SF'],
'score': [75, 27, 31, 22, 43, 20, 26, 40, 33, 20]})
print(df)
student city score
0 A LA 75
1 A LA 27
2 A NY 31
3 B DC 22
4 B NY 43
5 B NY 20
6 C SF 26
7 C SF 40
8 C LA 33
9 C SF 20
I have to replace the city
column based on which city appears most for that student. For example, here is the value_counts()
of the city for each student:
df.groupby('student').city.value_counts()
student city
A LA 2
NY 1
B NY 2
DC 1
C SF 3
LA 1
We can see, for the student A
, LA
appears most of the time. Hence, we want to replace other cities (here NY
) with LA
.
Desired output:
student city score
0 A LA 75
1 A LA 27
2 A LA 31
3 B NY 22
4 B NY 43
5 B NY 20
6 C SF 26
7 C SF 40
8 C SF 33
9 C SF 20
What would be the ideal way of getting the desired output? Any suggestions would be appreciated. Thanks!