I have a csv that contains datetime columns and I want to use Pandera to validate the columns and parse them to the correct format. An example value in the column would be: 2023-02-04T00:39:00+00:00
.
This is currently parsed in pandas to the right format using the following python code:
column = pd.to_datetime(column, format="%Y-%m-%dT%H:%M:%S")
column = column.dt.tz_convert("Europe/Amsterdam")
I would want to define a pandera DataFrame schema such that parsing is handled "automatically" when I read the csv with the following code:
schema = DataFrameSchema(
{
"datetime_column": Column() # how to implement the above here??
},
strict=True,
coerce=False,
)
df = pd.read_csv(src, dtype={col: str(dtype) for col, dtype in schema.dtypes.items()})
schema.validate(df)
I already use the above approach for simple types like string, ints, etc. But how would I do this for DateTime types (usually tz-aware)?
There is not a lot of documentation so I couldn't figure it out from the online documentation so far.