Given this minimal example
import pandas as pd
import pandera as pa
class MultiIndexTestSchema(pa.SchemaModel):
boolean_index_one: pa.typing.Index[bool] = pa.Field(coerce=True)
boolean_index_two: pa.typing.Index[bool] = pa.Field(coerce=True)
value: pa.typing.Series[int] = pa.Field()
df = pd.DataFrame({
"boolean_index_one": [True, False, True, True, False],
"boolean_index_two": [True, True, True, True, True],
"value": [1, 2, 3, 4, 5],
})
df = df.set_index(keys=["boolean_index_one", "boolean_index_two"])
MultiIndexTestSchema.validate(df)
Expected behaviour is that the validation should pass, since the two index columns contain boolean fields.
However, instead I get the following error:
raise schema_error from original_exc pandera.errors.SchemaError: expected series 'boolean_index_one' to have type bool, got object
This issue seems to only occur with MultiIndex DataFrames and with boolean fields. Changing type from bool
to int
magically resolves the issue.
I would like to know if anyone knows a workaround for this, if I am misinterpreting anything about defining the schemas?
Thanks in advance!
I am using pandera version 0.14.4