A possible improvement on the above answers would be to collect and report all the failure cases after evaluating the entire column.
This will return a filtered dataframe of all cases where Gender column is not equal to 'M' or 'F'.
import pandas as pd
df = pd.DataFrame({"MaritalStatus":["M","S","F"],"Gender":["M","S","F"]})
df.loc[~df.loc[:,"Gender"].isin(['M','F']),:]
>>> MaritalStatus Gender
1 S S
The same can be done for marital status:
df.loc[~df.loc[:,"MaritalStatus"].isin(['M','S','D']),:]
>>> MaritalStatus Gender
2 F F
If you're spot-checking the data for unexpected values, you can then get the values that fail these conditions:
expected_values = {"MaritalStatus":['M','S','D'],"Gender":['M','F']}
for feature in expected_values:
print(f"The following unexpected values were found in {feature} column:",
set(df.loc[~df.loc[:,feature].isin(expected_values[feature]),:][feature]))
>>> The following unexpected values were found in MaritalStatus column: {'F'}
>>> The following unexpected values were found in Gender column: {'S'}
Alternatively, you can use the pandera library, which allows you to establish expectations of your dataset and validate it against those expectations. Doing lazy evaluation allows you to see all the fail cases at once instead of getting a failure at each individual case.
import pandera as pa
schema = pa.DataFrameSchema(
{
"MaritalStatus":pa.Column(pa.String, checks=pa.Check.isin(["M","S","D"])),
"Gender":pa.Column(pa.String, checks=pa.Check.isin(["M","F"]))
},strict=False
)
schema.validate(df,lazy=True)
>>>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Daten\venv\lib\site-packages\pandera\schemas.py", line 592, in validate
error_handler.collected_errors, check_obj
pandera.errors.SchemaErrors: A total of 2 schema errors were found.
Error Counts
------------
- schema_component_check: 2
Schema Error Summary
--------------------
failure_cases n_failure_cases
schema_context column check
Column Gender isin({'F', 'M'}) [S] 1
MaritalStatus isin({'M', 'D', 'S'}) [F] 1