1

Given this minimal example

import pandas as pd
import pandera as pa

class MultiIndexTestSchema(pa.SchemaModel):
    boolean_index_one: pa.typing.Index[bool] = pa.Field(coerce=True)
    boolean_index_two: pa.typing.Index[bool] = pa.Field(coerce=True)
    value: pa.typing.Series[int] = pa.Field()

df = pd.DataFrame({
    "boolean_index_one": [True, False, True, True, False], 
    "boolean_index_two": [True, True, True, True, True],
    "value": [1, 2, 3, 4, 5],
})
df = df.set_index(keys=["boolean_index_one", "boolean_index_two"])
MultiIndexTestSchema.validate(df)

Expected behaviour is that the validation should pass, since the two index columns contain boolean fields.

However, instead I get the following error:

raise schema_error from original_exc pandera.errors.SchemaError: expected series 'boolean_index_one' to have type bool, got object

This issue seems to only occur with MultiIndex DataFrames and with boolean fields. Changing type from bool to int magically resolves the issue.

I would like to know if anyone knows a workaround for this, if I am misinterpreting anything about defining the schemas?

Thanks in advance!

I am using pandera version 0.14.4

Erik Lundin
  • 47
  • 1
  • 6

0 Answers0