1

Here is the dataframe with some NaN values,

data = {'Number':[100,None,None,200,150,None,100,120,110,210,120],
    'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
df

Output:

    Number  Street
0   100.0   A
1   NaN     B
2   NaN     C
3   200.0   D
4   150.0   C
5   NaN     D
6   100.0   A
7   120.0   B
8   110.0   B
9   210.0   D
10  120.0   B

I want to replace the NaN values of the column 'Number' by the mode of the same column with respect to the column 'Street'.

The output I need is:

    Number  Street
0   100       A
1   120       B
2   150       C
3   200       D
4   150       C
5   200       D
6   100       A
7   120       B
8   110       B
9   210       D
10  120       B

Explanation:

For example, consider row 1 which has a NaN value in the column Number and the corresponding Street is B . The NaN value of Number should be replaced by 120.0 which is the mode of the column Number with respect to Street. Because, the other values for the column Number for Street B are 120.0, 110.0 and 120.0 (look at row nos 7,8,10), and the mode for this is 120.0.

Abhi
  • 41
  • 6

2 Answers2

1

Use GroupBy.transform with lambda function for return first mode and replace missing values by Series.fillna:

f = lambda x: x.mode().iat[0]
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))

Or:

f = lambda x: fillna(x.mode().iat[0])
df['Number'] = df.groupby('Street')['Number'].transform(f)

print (df)
    Number Street
0    100.0      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6    100.0      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Here is possible error if some group has only NaN/Nones:

IndexError: index 0 is out of bounds for axis 0 with size 0

Then solution is:

data = {'Number':[None,None,None,200,150,None,None,120,110,210,120],
    'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
print (df)
    Number Street
0      NaN      A
1      NaN      B
2      NaN      C
3    200.0      D
4    150.0      C
5      NaN      D
6      NaN      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))
print (df)
    Number Street
0      NaN      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6      NaN      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Maybe a bit simpler, as mode returns an array So, you can impute Nan values within Number by grabbing the first element of it to get the results.

Solution 1:

>>> df['Number'] = df.groupby('Street')['Number'].apply(lambda x: x.fillna(x.mode()[0]))
    # df['Number'] = df.groupby('Street').transform(lambda x: x.fillna(x.mode()[0]))
>>> df
    Number Street
0    100.0      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6    100.0      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Solution 2:

You can see the another solution here based on the loc and first_valid_index

df['Number'] = df.groupby('Street')['Number'].transform(lambda s: s.loc[s.first_valid_index()])

OR

df.assign(Number=df.groupby(['Street']).Number.apply(lambda x: x.fillna(x.mode()[0])))

  or

df.assign(Number=df.groupby(['Street']).transform(lambda x: x.fillna(x.mode()[0])))
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53