Handling NaN Values in Pandas with Conditional Statement

Question

I'm working with some data where the customer postcode data is invalid. As a result I'm not able to map the CountryISOCode to their postcode resulting in a NaN. However, I have noticed that for all CountryISOCodes with NaN, the CurrencyCode can provide me with enough to fix the problem for now.

I've gone to various Stackoverflow articles but I cannot find the solution to my problem. I've tried...

def func(row):
    if row['CountryISOCode'] == np.nan & row['Currency'] == 'EUR':
        return 'IRE'
elif row['CountryISOCode'] == np.nan & row['Currency'] == 'GBP':
    return 'GBR'
else:
    return row['CountryISOCode']

df['CountryISOCode'] = df.apply(func, axis=1)

and some other methods but to no avail...

Below I have provided a replication of the data I'm working with

import pandas as pd
import numpy as np

data = [
    ['Steve', 'Invalid Postcode', 'GBP', np.nan ],
    ['Robyn', 'Invalid Postcode', 'EUR', np.nan],
    ['James', 'Valid Postcode', 'GBP', 'GBR'],
    ['Halo', 'Invalid Postcode', 'EUR', np.nan],
    ['Jesus', 'Valid Postcode', 'GBP', 'GBR']
    ]

df = pd.DataFrame(columns=["Name", "PostCode", "CurrencyCode", "CountryISOCode"], data=data)

Essentially if I was working with SQL my code would be as follows.

IF countryISOCode IS NULL 
    AND currency = ‘GBP’ 
THEN CountryISOCode =  ‘GBR’
ELSE
IF countryISOCode IS NULL 
    AND currency = ‘EUR 
THEN CountryISOCode =  ‘IRE’
ELSE countryISOCode 
END

Any ideas?

Have you tried changing `&` to `and`? There is a difference in the two and I think you want to use `and` here. You can see more about the difference here [https://stackoverflow.com/questions/22646463/difference-between-and-boolean-vs-bitwise-in-python-why-difference-i] — Rachel, Jan 29 '19 at 15:14
The code runs with `and` but doesn't seem to fix the problem. The data just remains the same. — Ryan Davies, Jan 29 '19 at 15:22

score 4 · Answer 1 · answered Jan 29 '19 at 16:03

I am adding this answer as it adds value to the original question. The reason the comparison statements weren't working is because np.nan == np.nan will not work. You can check for the identity of the NaN element but not equality. See in operator, float("NaN") and np.nan for more detail. With that said, this is how you can transform the original code to make it work as expected.

import pandas as pd                                                                                                                                    
import numpy as np

raw_data = [
    ['Steve', 'Invalid Postcode', 'GBP', np.nan ],
    ['Robyn', 'Invalid Postcode', 'EUR', np.nan],
    ['James', 'Valid Postcode', 'GBP', 'GBR'],
    ['Halo', 'Invalid Postcode', 'EUR', np.nan],
    ['Jesus', 'Valid Postcode', 'GBP', 'GBR']
    ]

df = pd.DataFrame(columns=["Name", "PostCode", "Currency", "CountryISOCode"], data=raw_data)

def func(row):
    if row['CountryISOCode'] is np.nan and row['Currency'] == 'EUR':
        return 'IRE'
    elif row['CountryISOCode'] is np.nan and row['Currency'] == 'GBP':
        return 'GBR'
    else:
        return row['CountryISOCode']

df['CountryISOCode'] = df.apply(func, axis=1)

print(df)

However, the other answers are great also.

yatu · Answer 2 · 2019-01-29T15:21:10.137

3

You could use np.select for this, which allows you to choose from a list depending on the result of a list of conditions:

m1 = df.CountryISOCode.isna()
m2 = df.CurrencyCode.eq('GBP')
m3 = df.CurrencyCode.eq('EUR')
df.loc[:,'CountryISOCode'] = np.select([m1&m2, m1&m3], ['GBP','IRE'], 
                                       default=df.CountryISOCode)

 Name          PostCode CurrencyCode CountryISOCode
0  Steve  Invalid Postcode          GBP            GBP
1  Robyn  Invalid Postcode          EUR            IRE
2  James    Valid Postcode          GBP            GBR
3   Halo  Invalid Postcode          EUR            IRE
4  Jesus    Valid Postcode          GBP            GBR

edited Jan 29 '19 at 15:21

answered Jan 29 '19 at 15:14

yatu

86,083
12
84
139

This sort of works but its filling in a GBP value with IRE which wouldn't work. Index 2 – Ryan Davies Jan 29 '19 at 15:18
Thanks for answering my question! Your solution worked. – Ryan Davies Jan 29 '19 at 15:54

jpp · Accepted Answer · 2019-01-29T15:54:16.393

3

You can use fillna with a dictionary specifying mappings for when currency code is helpful:

cmap = {'GBP': 'GBR', 'EUR': 'IRE'}
df['CountryISOCode'] = df['CountryISOCode'].fillna(df['CurrencyCode'].map(cmap))

print(df)

    Name          PostCode CurrencyCode CountryISOCode
0  Steve  Invalid Postcode          GBP            GBR
1  Robyn  Invalid Postcode          EUR            IRE
2  James    Valid Postcode          GBP            GBR
3   Halo  Invalid Postcode          EUR            IRE
4  Jesus    Valid Postcode          GBP            GBR

edited Jan 29 '19 at 15:54

answered Jan 29 '19 at 15:24

jpp

159,742
34
281
339

1

This solution has fixed my problem in the simplest manner... One minor change is that it should be `cmap = {'GBP': 'GBR', 'EUR': 'IRE'}` – Ryan Davies Jan 29 '19 at 15:53

anky · Answer 4 · 2019-01-29T15:22:33.943

use np.select() for multiple conditions and multiple choices:

df['CountryISOCode']=np.select([(df.CurrencyCode=='GBP')&(df.CountryISOCode.isna()),\
                            (df.CurrencyCode=='EUR')&df.CountryISOCode.isna()],['GBR','IRE'],\
                           default=df.CountryISOCode)

    Name          PostCode CurrencyCode CountryISOCode
0  Steve  Invalid Postcode          GBP            GBR
1  Robyn  Invalid Postcode          EUR            IRE
2  James    Valid Postcode          GBP            GBR
3   Halo  Invalid Postcode          EUR            IRE
4  Jesus    Valid Postcode          GBP            GBR

Thanks for answering my question! Your solution worked. – Ryan Davies Jan 29 '19 at 15:54 — Ryan Davies, Jan 29 '19 at 15:54

score 1 · Answer 5 · answered Jan 29 '19 at 15:34

1

While the other answer using np.select works, my personal favorite is using mask:

df['CountryISOCode'] = df['CountryISOCode'] \
    .mask(df['CountryISOCode'].isna() & df['Currency'].eq('GBP'), 'GBR') \
    .mask(df['CountryISOCode'].isna() & df['Currency'].eq('EUR'), 'IRE')

answered Jan 29 '19 at 15:34

adrtam

6,991
2
12
27

Thanks for answering my question!. – Ryan Davies Jan 29 '19 at 15:54

Handling NaN Values in Pandas with Conditional Statement

5 Answers5