How can I use Pandera to assert whether a column has one of multiple data types?

Asked Aug 28 '23 at 14:01

Active Aug 29 '23 at 07:46

Viewed 30 times

My Pandas dataframes need to adhere to the following Pandera schema:

import pandera as pa
from pandera.typing import Series

class schema(pa.SchemaModel):
    name: Series[str]
    id: Series[str]

However, in some dataframe instances, the "id" column will only contain integers and thus will get the "int" datatype when using pd.read_csv().

For example, I have the following dataframe:

When I run schema(df).validate() I get the error: pandera.errors.SchemaError: expected series 'id' to have type str, got int64

However, in other cases the dataframe might look something like this:

I would like to account for both situations by allowing the column to be one of both datatypes.

This is what I tried (but it doesn't seem to be the correct syntax, as the validation method won't run):

import pandera as pa
from pandera.typing import Series
from typing import Union

class schema(pa.SchemaModel):
    name: Series[str]
    id: Union[Series[str], Series[int]]

Is there any way to do this in Pandera?

edited Aug 29 '23 at 07:46

asked Aug 28 '23 at 14:01

Neele22

What error do you get specifically? _(Could be helpful to share a sample dataframe you are trying validate.)_ – Yaakov Bressler Aug 28 '23 at 17:23
I updated my question, hopefully this helps! – Neele22 Aug 29 '23 at 07:46

How can I use Pandera to assert whether a column has one of multiple data types?

0 Answers0