1

My Pandas dataframes need to adhere to the following Pandera schema:

import pandera as pa
from pandera.typing import Series

class schema(pa.SchemaModel):
    name: Series[str]
    id: Series[str]

However, in some dataframe instances, the "id" column will only contain integers and thus will get the "int" datatype when using pd.read_csv().

For example, I have the following dataframe:

example of a dataframe containing columns "name" and "id" with three rows, where "id" is always an integer

When I run schema(df).validate() I get the error: pandera.errors.SchemaError: expected series 'id' to have type str, got int64

However, in other cases the dataframe might look something like this:

example of a dataframe containing columns "name" and "id" with three rows, where "id" is sometimes a string

I would like to account for both situations by allowing the column to be one of both datatypes.

This is what I tried (but it doesn't seem to be the correct syntax, as the validation method won't run):

import pandera as pa
from pandera.typing import Series
from typing import Union

class schema(pa.SchemaModel):
    name: Series[str]
    id: Union[Series[str], Series[int]]

Is there any way to do this in Pandera?

Neele22
  • 373
  • 2
  • 18

0 Answers0