1

I'm learning Python but have a good 13+ years experience in other languages. Today I studied the JSON serialization in Python and as far as I understand, not just the default json module but also every other custom one out there cannot serialize an arbitrary class. For example:

import ujson

class Person:
    def __init__(self, name, age, learnsPython, controllers):
        self.name = name
        self.age = age
        self.learnsPython = learnsPython
        self.controllers = controllers

person = Person("Mike", 34, True, ['XboX', 'Super Nintendo'])
print(ujson.dumps(person))

This doesn't work, regardless of a serializer module I'd use.

Perhaps it's something I just missed at a conceptual level. Or possibly, I'm just spoiled with Java or, say, dotnet having e.g. [System.Web.Extensions]System.Web.Script.Serialization.JavaScriptSerializer.Deserialize<T>() for classic .NetFramework or System.Text.Json.JsonSerializer.Deserialize<T>() for Core.

You just take a type you wish, and [de]serialize it around as much as you need.

Main question: does simple OOB [de]serialization really not exist in Python for any arbitrary type? And if so then why?

Side note after 2 min googling: I can clearly see the way of doing such serializer by simply using vars() on any type, possibly recursively, even with standard json module. And assume the deserialization is also possible in a similar manner, through setattr(). So why nobody does this?

Mike Makarov
  • 1,287
  • 8
  • 17
  • Use pickle for arbitary objects, though note that there may be security concerns. PyYAML would also work if you want it to be text-only, though the same security issues apply. – SuperStormer Mar 27 '21 at 19:30
  • https://stackoverflow.com/a/44777837/7941251 and https://stackoverflow.com/questions/6578986/how-to-convert-json-data-into-a-python-object may work if you specifically want JSON. – SuperStormer Mar 27 '21 at 19:33
  • 1
    There is no universal object serialization in Python, full stop. The closest you can get in the standard library is `pickle`, but there are types it doesn't work on. You can install `dill` for more types, but there are still a few types even that can't serialize. JSON can have strings, so you can put pickles in JSON, but beware that unpickling can execute arbitrary code. – gilch Mar 27 '21 at 19:42
  • @SuperStormer pickle does not do json, but there is `jsonpickle` library. It does JSON but adds something like this on top `"py/object": "__main__.Person"`. That's not quite the serialization I'd expect. – Mike Makarov Mar 27 '21 at 19:47
  • what about just doing `print(json.dumps(person.__dict__))` – Chris Doyle Mar 27 '21 at 19:48
  • @gilch Still unclear why this is the case. Python now has generic functions and classes, so why even jsonpickle adds garbage in the resulting json not allowing to simply Deserialize(type(Person))? – Mike Makarov Mar 27 '21 at 19:49
  • @ChrisDoyle that's still manual work. I understand the workarounds exist, I even brought one into the answer. The question is around why the problem exists in the first place? – Mike Makarov Mar 27 '21 at 19:50

2 Answers2

2

(answering my own question, more details or corrections welcomed) I have spent some time looking into these questions and I think I have better understanding now.

  1. Q: Is there no universal object serialization method in Python (to json)?
    A: no there is not. Quoting @gilch,

There is no universal object serialization in Python, full stop.

The closest equivalent there is, appears to be the module jsonpickle though it has some security considerations. With this module, it's going to work for the reasonable majority of the objects, and can be further expanded to support e.g. numpy types, too. Where the serialized content goes outside, the serialization can be done with unpicklable=False, which generates the regular JSON form of a DTO one may expect from e.g. dotnet.

string = jsonpickle.dumps(person1, unpicklable=False)
print(string)
>> {"name": "Mike", "age": 34, "learnsPython": true, "controllers": ["XboX", "Super Nintendo"]}

For the round-robin, this parameter has to be omitted or set to true which results in some metadata field(s) added to the payload, but this enables the JSON to be converted back to the source object with loads():

string = jsonpickle.dumps(person1, unpicklable=False)
print(string)
>> {"py/object": "__main__.Person", "name": "Mike", "age": 34, "learnsPython": true, "controllers": ["XboX", "Super Nintendo"]

There is also a number of alternatives to be used with other libraries (e.g. json), namely dumps(myObject.__dict__), dumps(vars(myType)), etc., up to custom default() methods and type-specific serializers.

  1. Q:Why is this so?
    A: (my speculation here based on the code I read) Since Python does not support constructor overloads, which eliminates for most of the types the availability of a parameterless constructor, serializing and deserializing types requires to have custom approach for a number of known types out there. This also calls for strict data contracts that serializers would have to adhere to in order to deserialize the objects back - and this is what pickle/jsonpickle does.
    Speculation #2: I assume the decision to put the type into the data instead of allowing the caller to specify the type was done for the purpose of data integrity, and to follow the established pickle serialization practice.
    Speculation #3: the generics only exist in Python since 3.5, so relatively recently, and hence are not yet used as a technique for deserialization.
Mike Makarov
  • 1,287
  • 8
  • 17
  • I didn't follow all the details of your argument, nor do I know the answer to your question. But I do want to point out that the presence of generics to python is not relevant. The type system supported by the `typing` module plays no role in runtime; no changes made to it makes it any easier or any harder to do universal object (de)serialization. – max Apr 08 '23 at 13:42
0

The pickle module can be used for binary serialization of most Python objects.

JSON can only represent a subset of Python by default. See the Python documentation's comparison of pickle and JSON for more details.

However, for many purposes, converting the dictionary representation of an object to JSON may be sufficient:

import json

class Person:
    def __init__(self, name, age, learnsPython, controllers):
        self.name = name
        self.age = age
        self.learnsPython = learnsPython
        self.controllers = controllers

person = Person("Mike", 34, True, ['XboX', 'Super Nintendo'])

print(json.dumps(person.__dict__))

Output:

{"name": "Mike", "age": 34, "learnsPython": true, "controllers": ["XboX", "Super Nintendo"]}
  • Well, yes, until you add a child Person object field. And it's still just a workaround. I want tio understand why the problem exists in the first place? Why nobody wrote a type-agnostic (de)serializer? Well, jsonpickle, yes, but as I wrote above in the comments, it adds type info into the resulting object – Mike Makarov Mar 27 '21 at 19:52
  • Generally, Python programs tend to only serialize specific data instead of the entire object. Serialization is not as necessary or widely used as it is in Java or C#. – Dharma Bellamkonda Mar 27 '21 at 20:55