1

I've built some pandera schema models that inherit from one another, but it seems that pandera SchemaModels don't inherit the Config from one another. Is this by design or am I doing something wrong?

For example:

from pandera.typing import Series
import pandera as pa

class Model1(pa.SchemaModel):
    col1: Series[int]
    class Config:
        unique=['col1']

class Model2(Model1):
    col2: Series[str]

With the above SchemaModels defined, I would expect that Model2 would also have an attribute of Model2.Config.unique that was equal to ['col1'] but that doesn't seem to get inherited. For every inherited subclass is it expected to re-define the Config and re-inherit what is defined in the parent classes?

  • A cursory sift through their [issue tracker](https://github.com/unionai-oss/pandera/issues) did not give any results. This may or may not be intentional. If I were you, I'd search there a bit and then post an issue about this. In the meantime you can always inherit inside `Model2` directly: `class Config(Model1.Config):` – Daniil Fajnberg Oct 23 '22 at 17:12
  • Thanks. That's what I'm currently doing. Good suggestion and I'll open an issue there. – Kyle Hansen Oct 24 '22 at 15:21

1 Answers1

1

First off, it should be noted that during validation, everything resolves correctly (as also noted in the discussion in one of the project's issues #983). The explanation below the example deals with implementation details which all happen under the hood, but they are important to understand in order to answer your question.

import pandas as pd
import pandera as pa
from pandera.typing import Series

class A(pa.SchemaModel):
    foo: Series[int]

    class Config:
        unique = ["foo"]
        coerce = False

class B(A):
    bar: Series[int]

    class Config:
        coerce = True


df = pd.DataFrame({"foo": ["1", "1"], "bar": ["1", "1"]})

print(B.validate(df, lazy=True))  # coercion to int works, uniqueness check on "foo" fails

So, config properties do propagate from parents to children, albeit using a config-assembling method that walks the mro instead of relying on inheritance. The Config attribute is bound to the default if it wasn't defined. While causing some confusion, this also has some mild upsides.

The Config being only part of the recipe and not the source of truth fits with the overall design, because a SchemaModel does not implement any validation itself -- it is a blueprint to create a DataFrameSchema instance. The actual config is cached lazily at __config__ (essentially behaving like a class-property) on every SchemaModel and is not written back to Config.

It probably could have, but it seems more sane not to. If inheritance isn't done, pretending that it took place isn't necessary and might ultimately introduce weird edge cases or bugs.

Arne
  • 17,706
  • 5
  • 83
  • 99