0

Is it possible to validate a column based on another column using Pandera?
My dataframe looks like this:

df = pd.DataFrame({
    "Name": ["Thomas","",""],
    "Address": ["Address 1", "Address 1", "Address 3"],
    "Zip": ["65989", "65989", "65954"],
    "External": [False, True, False],
})

I would like to validate the "Name" column based on the "External" column. If external = True then Name can be empty. In this example, the third record should be invalid because external = False and the name is missing.

A related question here suggests to use the wide checks. However, in this way all the columns of the third record are evaluated as invalid (Name, Address, Zip and External) but I need only the Name to be invalid and ignore the rest.

schema_ = pa.DataFrameSchema({
    "Name": pa.Column(str),
    "Address": pa.Column(str),
    "Zip": pa.Column(str),
    "External": pa.Column(str),
},
checks=[pa.Check.is_external()])

 @extensions.register_check_method(check_type="element_wise",)
def is_external(pandas_obj: pd.Series):
     if (pandas_obj["external"] == True) and (len(pandas_obj["Name"]))<1:
         return False
     else:
         return True
 

I also tried something like this in the schema:

     "Name": pa.Column(str, checks=[
         pa.Check.is_external(df["external"]),
     ]),

But in this case all the values are passed to the function "[False, True, False]" and I am not sure how to compare them to the corresponding values of the Name column.

Is it possible to make this kind of checks in Pandera ? Thanks in advance!

Nebiros
  • 13
  • 4

0 Answers0