How to fill missing values in a DataFrame with the most frequent value of each group?

Question

I have a pandas DataFrame with two columns: toy and color. The color column includes missing values.

How do I fill the missing color values with the most frequent color for that particular toy?

Here's the code to create a sample dataset:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
    'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
             'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
    })

Here's the sample dataset:

      toy  color
0     car    red
1     car   blue
2     car   blue
3     car    NaN
4   train  green
5   train    NaN
6   train    red
7   train    red
8   train    NaN
9    ball   blue
10   ball    red
11   ball    NaN
12  truck  green

Here's the desired result:

Replace the first NaN with blue, since that is the most frequent color for a car.
Replace the second and third NaNs with red, since that is the most frequent color for a train.
Replace the fourth NaN with either blue or red, since they are tied for the most frequent color for a ball.

Notes about the real dataset:

There are many different toy types (not just four).
There are no toy types that only have missing values for color, so the answer does not need to handle that case.

This question is related, but it doesn't answer my question of how to use the most frequent value to fill in missing values.

Anurag Dabas · Accepted Answer · 2021-08-19T13:55:25.373

3

You can use groupby()+transform()+fillna():

df['color']=df['color'].fillna(df.groupby('toy')['color'].transform(lambda x:x.mode().iat[0]))

OR

If want to select random values when there are 2 or more frequent values:

from random import choice

df['color']=df['color'].fillna(df.groupby('toy')['color'].transform(lambda x:choice(x.mode())))

edited Aug 19 '21 at 13:55

answered Aug 19 '21 at 13:51

Anurag Dabas

23,866
9
21
41

2

beat me to it, +1 ;) – mozway Aug 19 '21 at 13:52
Doesn't that replace all the colors with the most frequent color for each toy? i.e. the color for each car is replaced with blue. – norie Aug 19 '21 at 13:54
@norie ohh..yes corrected...thanks for noticing **:)** – Anurag Dabas Aug 19 '21 at 13:57

score 2 · Answer 2 · answered Aug 19 '21 at 13:56

2

You want to fillna with the mode:

df["color"] = df.groupby("toy")["color"].apply(lambda x: x.fillna(x.mode().iat[0]))

answered Aug 19 '21 at 13:56

not_speshal

22,093
2
15
30

How to fill missing values in a DataFrame with the most frequent value of each group?

2 Answers2

Linked

Related