What's the best way to validate that the values in a column in a data frame match those in a given list (and, if not, are set to a value accordingly)?
I'm trying to replace invalid values for a field ("B") with the value "(not set)".
Here's what I've tried. The goal is to replace 'f' in 'B' with "(not set)". The trouble is validating against a list of values rather than a single boolean expression.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'b', 'f', 'a', 'b', 'c']})
valid_values = ['a', 'b', 'c', 'd']
validated_data = df.'B' not in [valid_values]
column_name = 'B'
df.loc[validated_data, column_name] = "(not set)"
The above taken from here: Conditional Replace Pandas
I've also tried:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'b', 'f', 'a', 'b', 'c']})
valid_values = ['a', 'b', 'c', 'd']
for x in df.B:
if x not in valid_values:
df.B = "(not set)"
...
However, for both I get the error "The truth value of a Series is ambiguous." I tried the recommendations at the following post to no avail.
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Other recommendations greatly appreciated!