Questions tagged [pandera]

pandera provides a flexible and expressive API for performing data validation on dataframes to make data processing pipelines more readable and robust.

38 questions
5
votes
3 answers

Pandera validate get all valid rows

I am trying to use pandera library (I am very new with this) for pandas dataframe validation. What I want to do is to ignore the rows which are not valid as per the schema. How can I do that? for example: pandera schema looks like below: import…
3
votes
1 answer

How to validate dataframe in pandera using multiple columns

I have following dataframe. Need to validate dataframe to check if there exists rows with columns Name and tag both NULL at the same time. I tried following - but index where it fails are 0 & 2. import pandas as pd import pandera as pa data =…
user3376169
  • 405
  • 1
  • 5
  • 17
3
votes
1 answer

Ingesting An Null Int Column: Pandas and Pandera

I am using pandas with pandera for schema validation, but I've run into a problem since there's a null integer column in the data. from prefect import task, Flow #type:ignore from pandera import Check, Column, DataFrameSchema import…
kn0t
  • 303
  • 6
  • 13
2
votes
2 answers

Inherit as required only some fields from parent pandera SchemaModel

I have Input and Output pandera SchemaModels and the Output inherits the Input which accurately represents that all attributes of the Input schema are in the scope of the Output schema. What I want to avoid is inheriting all attributes as required…
Konstantin
  • 396
  • 3
  • 19
1
vote
0 answers

How can I use Pandera to assert whether a column has one of multiple data types?

My Pandas dataframes need to adhere to the following Pandera schema: import pandera as pa from pandera.typing import Series class schema(pa.SchemaModel): name: Series[str] id: Series[str] However, in some dataframe instances, the "id"…
Neele22
  • 373
  • 2
  • 18
1
vote
0 answers

Can Pandera convert my pa.DataFrameModel into a pa.SeriesSchema?

Given this DataFrame import pandera as pa class MyDataframeSchema(pa.DataFrameModel): state: pa.Series[str] = pa.Field() city: pa.Series[str] = pa.Field() price: pa.Series[int] = pa.Field() df = pa.DataFrame[MyDataframeSchema]( …
asiera
  • 492
  • 5
  • 12
1
vote
1 answer

How can I use Pandera to check a Pandas column that might have floats or ints

I am trying to set up a DataFrameSchema in Pandera. The catch is that one of the columns of data may be a float or an int, depending on what data source was used to create the dataframe. Is there a way to set up a check on such a column? This code…
wdchild
  • 51
  • 7
1
vote
2 answers

Create empty pandas dataframe from pandera DataFrameModel

Is there a way to create an empty pandas dataframe from a pandera schema? Given the following schema, I would like to get an empty dataframe as shown below: from pandera.typing import Series, DataFrame class MySchema(pa.DataFrameModel): state:…
MJA
  • 357
  • 2
  • 5
  • 10
1
vote
1 answer

pytest issue with pandera

I wrote a test to experiment with pandera for DataFrame validation. I put the validation schema in a pytest fixture and passed it to the unit test I had. Now, I have this odd issue: when I pip install pandera into my virtual environment, pytest…
eebina
  • 13
  • 3
1
vote
1 answer

How to define a Pandera DataFrame schema for validating and parsing datetime columns?

I have a csv that contains datetime columns and I want to use Pandera to validate the columns and parse them to the correct format. An example value in the column would be: 2023-02-04T00:39:00+00:00. This is currently parsed in pandas to the right…
Yorian
  • 11
  • 1
1
vote
0 answers

Pandera SchemaError when validating multiindex

Given this minimal example import pandas as pd import pandera as pa class MultiIndexTestSchema(pa.SchemaModel): boolean_index_one: pa.typing.Index[bool] = pa.Field(coerce=True) boolean_index_two: pa.typing.Index[bool] =…
Erik Lundin
  • 47
  • 1
  • 6
1
vote
2 answers

Getting "TypeError: type of out argument not recognized: " when using class function with Pandera decorator

I am trying to get to use decorators from Python package "Pandera" and I am having trouble to get them work with classes. First I create schemas for Pandera: from pandera import Column, Check import yaml in_ = pa.DataFrameSchema( { …
illuminato
  • 1,057
  • 1
  • 11
  • 33
1
vote
0 answers

How to enforce Decimal dtype in pandas DataFrame

How can I stricly enforce a dtype Decimal in a pandas DataFrame? To clarify: I am not looking for weak workarounds, such as rounding every time I write to or read from a column (and hope that no other operations happend elsewhere that might lead to…
KingOtto
  • 840
  • 5
  • 18
1
vote
1 answer

Pandera SchemaModels don't seem to inherit Config

I've built some pandera schema models that inherit from one another, but it seems that pandera SchemaModels don't inherit the Config from one another. Is this by design or am I doing something wrong? For example: from pandera.typing import…
1
vote
2 answers

Pandera/PySpark DataFrame error: TypeError: Unary ~ can not be applied to booleans

I am trying to use pandera to validate a pyspark data frame's schema and coming across an unexpected error when validating constraints on a date column -- made all the more confusing by the fact that the error is not raised when applying the…
sam
  • 21
  • 2
1
2 3