I have been working with a dataframe in Pandas that contains duplicate entries along with non-duplicates in a column. The dataframe looks something like this:
country_name values category
0 country_1 10 a
1 country_2 20 b
2 country_1 50 a
3 country_2 10 b
4 country_3 100 c
5 country_4 10 d
I want to write something that converts(replaces) duplicates with their mean values in my dataframe. An ideal output would be something similar to the following:
country_name values category
0 country_1 30 a
1 country_2 15 b
2 country_3 100 c
3 country_4 10 d
I have been struggling with this for a while so I would appreciate any help. I have forgotten to add category column. The problem with groupby()
method as you now when you call mean()
it does not return category
column back. My solution was to take numerical columns and the column that has duplicates together apply groupby().mean()
then concatenate back to the categorical columns. So I am looking for a solution shorten than what I have done.
My method get tedious when you are dealing with many categorical columns.