Replace missing data based on certain conditions

Question

Let's say I have data:

      a   b
0    1.0  NaN
1    6.0  1
2    3.0  NaN
3    1.0  NaN

I would like to iterate over this data to see, if Data[i] == NaN **and** column['a'] == 1.0 then replace NAN with 4 instead of replace by 4 in any NaN you see. How shall I go about it? I tried every for if function and it didn't work. I also did

for i in df.itertuples():

but the problem is df.itertuples() doesn't have a replace functionality and the other methods I've seen were to do it one by one.

End Result looking for:

      a   b
0    1.0  4
1    6.0  1
2    3.0  NaN
3    1.0  4

Hi Daniyal see this topic, please. I believe that it can be helpful for you . https://stackoverflow.com/questions/14162723/replacing-pandas-or-numpy-nan-with-a-none-to-use-with-mysqldb — Felipe Cabral, Oct 22 '20 at 03:04

score 0 · Answer 1 · answered Oct 22 '20 at 03:05

0

def func(x):
    if x['a'] == 1 and pd.isna(x['b']):
        x['b'] = 4
    return x

df = pd.DataFrame.from_dict({'a': [1.0, 6.0, 3.0, 1.0], 'b': [np.nan, 1, np.nan, np.nan]}) 
df.apply(func, axis=1)

Instead of iterrows(), apply() may be a better option.

answered Oct 22 '20 at 03:05

Chris Tang

567
7
18

Should work! even though i am not sure why df is defined later. I thought x should be df. – Oct 22 '20 at 04:13
@Daniyaldehleh `x` is the series, or the row in this case, of the `df`. `apply()` works on each row/column of the `df`, so the function should handle `x` instead of `df`. – Chris Tang Oct 22 '20 at 04:34

drcrisp · Answer 2 · 2020-10-22T05:55:11.897

0

Like you said, you can achieve this by combining 2 conditions: a==1 and b==Nan.

To combine two conditions in python you can use &.

In your example:

import pandas as pd
import numpy as np

# Create sample data
d = {'a': [1, 6, 3, 1], 'b': [np.nan, 1, np.nan, np.nan]}
df = pd.DataFrame(data=d)

# Convert to numeric
df = df.apply(pd.to_numeric, errors='coerce')
print(df)

# Replace Nans
df[ (df['a'] == 1 ) & np.isnan(df['b']) ] = 4
print(df)

Should do the trick.

edited Oct 22 '20 at 05:55

answered Oct 22 '20 at 03:06

drcrisp

193
6

AttributeError: module 'numpy' has no attribute 'isnull' – Oct 22 '20 at 03:55
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' – Oct 22 '20 at 03:55
Maybe your entries are strings rather than numbers. You can add: df = df.apply(pd.to_numeric, errors='coerce'). I will modify my answer accordingly. – drcrisp Oct 22 '20 at 05:54
Hi there, I added the df = df.apply(pd.to_numeric, errors='coerce') . Nonetless, the error presisted. – Oct 23 '20 at 00:14

score 0 · Answer 3 · answered Oct 22 '20 at 03:13

You can create a mask and then fill in the intended NaNs using that mask:

df = pd.DataFrame({'a': [1,6,3,1], 'b': [np.nan, 1, np.nan, np.nan]})
mask = df[['a', 'b']].apply(lambda x: (x[0] == 1) and (pd.isna(x[1])), axis=1)
df['b'] = df['b'].mask(mask, df['b'].fillna(4))
print(df)

score 0 · Answer 4 · answered Oct 22 '20 at 03:20

0

df2 = df[df['a']==1.0].fillna(4.0)
df2.combine_first(df)

Can this help you?

answered Oct 22 '20 at 03:20

horsefall

41
2

Replace missing data based on certain conditions

4 Answers4