Python - find items with multiple occurences and replace with mean

Question

For df:

sample    type    count
sample1   red     5
sample1   red     7
sample1   green   3
sample2   red     2
sample2   green   8
sample2   green   8
sample2   green   2
sample3   red     4
sample3   blue    5

I would like to find items in "type" with multiple occurences and replace the "count" for each of those with the mean count. So expected output:

sample    type    count
sample1   red     6
sample1   green   3
sample2   red     2
sample2   green   6
sample3   red     4
sample3   blue    5

So

non_uniq = df.groupby("sample")["type"].value_counts()
non_uniq = non_uniq.where(non_uniq > 1).dropna()

finds the "type" with multiple occurences but I don't know how to match it in df

Dupes: [one](https://stackoverflow.com/questions/30482071/how-to-calculate-mean-values-grouped-on-another-column-in-pandas), [two](https://stackoverflow.com/questions/30328646/python-pandas-group-by-in-group-by-and-average), [three](https://stackoverflow.com/questions/46938572/pandas-groupby-mean-into-a-dataframe), [four](https://stackoverflow.com/questions/53287976/getting-the-average-value-for-each-group-of-a-pandas-dataframe) etc — rafaelc, Dec 19 '19 at 14:26

jezrael · Answer 1 · 2019-12-19T14:32:00.570

I believe you can simplify solution to mean per all groups, because mean by value is same like this value:

df = df.groupby(["sample","type"], as_index=False, sort=False)["count"].mean()
print (df)
    sample   type  count
0  sample1    red      6
1  sample1  green      3
2  sample2    red      2
3  sample2  green      6
4  sample3    red      4
5  sample3   blue      5

Your solution is possible change by:

m = df.groupby(["sample", "type"])['type'].transform('size') > 1
df1 = df[m].groupby(["sample","type"], as_index=False, sort=False)["count"].mean()

df = pd.concat([df1, df[~m]], ignore_index=True)
print (df)
    sample   type  count
0  sample1    red      6
1  sample2  green      6
2  sample1  green      3
3  sample2    red      2
4  sample3    red      4
5  sample3   blue      5

Python - find items with multiple occurences and replace with mean

1 Answers1