Validate that Values in DataFrame Found in List

Question

What's the best way to validate that the values in a column in a data frame match those in a given list (and, if not, are set to a value accordingly)?

I'm trying to replace invalid values for a field ("B") with the value "(not set)".

Here's what I've tried. The goal is to replace 'f' in 'B' with "(not set)". The trouble is validating against a list of values rather than a single boolean expression.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'b', 'f', 'a', 'b', 'c']})
valid_values = ['a', 'b', 'c', 'd']
validated_data = df.'B' not in [valid_values]
column_name = 'B'
df.loc[validated_data, column_name] = "(not set)"

The above taken from here: Conditional Replace Pandas

I've also tried:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'b', 'f', 'a', 'b', 'c']})
valid_values = ['a', 'b', 'c', 'd']

for x in df.B:
    if x not in valid_values:
        df.B = "(not set)"
...

However, for both I get the error "The truth value of a Series is ambiguous." I tried the recommendations at the following post to no avail.

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Other recommendations greatly appreciated!

You will get more and better answers if you create a [Minimal, Complete, and Verifiable](http://stackoverflow.com/help/mcve) example. Especially make sure that the input and expected data are complete (not pseudo-data), and can be easily cut and and paste into an editor to allow testing proposed solutions. — Stephen Rauch, Jan 03 '18 at 03:42
Thank you. Error's answer below does this, but I will also edit the above for future passers by. — le_crease, Jan 03 '18 at 17:54

score 1 · Accepted Answer · answered Jan 03 '18 at 10:52

You need to use the isin method of Series and DataFrame objects.

Consider this example:

df = pd.DataFrame({'A': [1, 2, 3,4,5,6], 'B': ['a', 'b', 'f','a','b','c']})

Resulting in this DataFrame:

Let's say we want to remove invalid values in column B and those invallid values are a en b.

When we use the isin method we can run this code:

mask = df['B'].isin(['a','b'])
df.loc[mask,'B'] = 'not set'

Resulting in this DataFrame:

    A   B
0   1   not set
1   2   not set
2   3   f
3   4   not set
4   5   not set
5   6   c

The isin method checks if the values in a Series or DataFrame object are in the given iterable. See the docs for more information.

Great, thank you very much. I adjusted `mask = df['B'].isin(['a','b'])` to be `mask = ~df['B'].isin(['a','b'])` so that it is checking if a value is NOT in the given list. — le_crease, Jan 03 '18 at 17:50

Validate that Values in DataFrame Found in List

1 Answers1