My Pandas dataframes need to adhere to the following Pandera schema:
import pandera as pa
from pandera.typing import Series
class schema(pa.SchemaModel):
name: Series[str]
id: Series[str]
However, in some dataframe instances, the "id" column will only contain integers and thus will get the "int" datatype when using pd.read_csv()
.
For example, I have the following dataframe:
When I run schema(df).validate()
I get the error: pandera.errors.SchemaError: expected series 'id' to have type str, got int64
However, in other cases the dataframe might look something like this:
I would like to account for both situations by allowing the column to be one of both datatypes.
This is what I tried (but it doesn't seem to be the correct syntax, as the validation method won't run):
import pandera as pa
from pandera.typing import Series
from typing import Union
class schema(pa.SchemaModel):
name: Series[str]
id: Union[Series[str], Series[int]]
Is there any way to do this in Pandera?