I have following dataframe. Need to validate dataframe to check if there exists rows with columns Name and tag both NULL at the same time. I tried following - but index where it fails are 0 & 2.
import pandas as pd
import pandera as pa
data = [['Alex',10,'t1'],['Bob',12,None],['Clarke',13,'t3'],[None,14,'t3'],[None,15,None]]
df = pd.DataFrame(data,columns=['Name','Age','Tag'])
schema = pa.DataFrameSchema(checks=pa.Check(lambda df: ~(pd.notnull(df["Name"])&pd.notnull(df["Tag"])) )
)
try:
schema.validate(df)
except pa.errors.SchemaErrors as err:
print("Schema errors and failure cases:")
print(err.failure_cases)
I want above code to return index as 4. How should I create check for pandera schema.