I have a simple pydantic
model with nested data structures.
I want to be able to simply save and load instances of this model as .json file.
All models inherit from a Base
class with simple configuration.
class Base(pydantic.BaseModel):
class Config:
extra = 'forbid' # forbid use of extra kwargs
There are some simple data models with inheritance
class Thing(Base):
thing_id: int
class SubThing(Thing):
name: str
And a Container
class, which holds a Thing
class Container(Base):
thing: Thing
I can create a Container
instance and save it as .json
# make instance of container
c = Container(
thing = SubThing(
thing_id=1,
name='my_thing')
)
json_string = c.json(indent=2)
print(json_string)
"""
{
"thing": {
"thing_id": 1,
"name": "my_thing"
}
}
"""
but the json string does not specify that the thing
field was constructed using a SubThing
. As such, when I try to load this string into a new Container
instance, I get an error.
print(c)
"""
Traceback (most recent call last):
File "...", line 36, in <module>
c = Container.parse_raw(json_string)
File "pydantic/main.py", line 601, in pydantic.main.BaseModel.parse_raw
File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Container
thing -> name
extra fields not permitted (type=value_error.extra)
"""
Is there a simple way to save the Container
instance while retaining information about the thing
class type such that I can reconstruct the initial Container
instance reliably? I would like to avoid pickling the object if possible.
One possible solution is to serialize manually, for example using
def serialize(attr_name, attr_value, dictionary=None):
if dictionary is None:
dictionary = {}
if not isinstance(attr_value, pydantic.BaseModel):
dictionary[attr_name] = attr_value
else:
sub_dictionary = {}
for (sub_name, sub_value) in attr_value:
serialize(sub_name, sub_value, dictionary=sub_dictionary)
dictionary[attr_name] = {type(attr_value).__name__: sub_dictionary}
return dictionary
c1 = Container(
container_name='my_container',
thing=SubThing(
thing_id=1,
name='my_thing')
)
from pprint import pprint as print
print(serialize('Container', c1))
{'Container': {'Container': {'container_name': 'my_container',
'thing': {'SubThing': {'name': 'my_thing',
'thing_id': 1}}}}}
but this gets rid of most of the benefits of leveraging the package for serialization.