0

Short version: How can one serialize a class (class reference, i.e. not an object) that is a member of an object (see: example)?

Long version:

I have been using the answer to this question in my work: How can I ignore a member when serializing an object with PyYAML?

So, my current implementation is this:

class SecretYamlObject(yaml.YAMLObject):
    """Helper class for YAML serialization.
    Source: https://stackoverflow.com/questions/22773612/how-can-i-ignore-a-member-when-serializing-an-object-with-pyyaml """

    def __init__(self, *args, **kwargs):
        self.__setstate__(self, kwargs) #Default behavior, so one could just use setstate
        pass

    hidden_fields = []
    @classmethod
    def to_yaml(cls,dumper,data):
        new_data = copy(data)
        for item in cls.hidden_fields:
            if item in new_data.__dict__:
               del new_data.__dict__[item]
        res = dumper.represent_yaml_object(cls.yaml_tag, new_data, cls, flow_style=cls.yaml_flow_style)
        return res

So far, this has been working fine for me because until now I have only needed to hide loggers:

class EventManager(SecretYamlObject):
    yaml_tag = u"!EventManager"
    hidden_fields = ["logger"]

    def __setstate__(self, kw): # For (de)serialization
        self.logger = logging.getLogger(__name__)
        self.listeners = kw.get("listeners",{})
        #...
        return


    def __init__(self, *args, **kwargs):
        self.__setstate__(kwargs)
        return

However, a different problem appears when I try to serialize non-trivial objects (if Q is directly from object, this is fine, but from yaml.YAMLObject it fails with "can't pickle int objects"). See this example:

class Q(SecretYamlObject): #works fine if I just use "object"
    pass

class A(SecretYamlObject):
    yaml_tag = u"!Aobj"
    my_q = Q
    def __init__(self, oth_q):
        self.att = "att"
        self.oth_q = oth_q
        pass
    pass

class B(SecretYamlObject):
    yaml_tag = u"!Bobj"
    my_q = Q
    hidden_fields = ["my_q"]
    def __init__(self, oth_q):
        self.att = "att"
        self.oth_q = oth_q
        pass
    pass

class C(SecretYamlObject):
    yaml_tag = u"!Cobj"
    my_q = Q
    hidden_fields = ["my_q"]

    def __init__(self, *args, **kwargs):
        self.__setstate__(kwargs)
        pass

    def __setstate__(self, kw):
        self.att = "att"
        self.my_q = Q
        self.oth_q = kw.get("oth_q",None)
        pass

    pass

a = A(Q)
a2 = yaml.load(yaml.dump(a))

b = B(Q)
b2 = yaml.load(yaml.dump(b))

c = C(my_q=Q)
c2 = yaml.load(yaml.dump(c))
c2.my_q
c2.oth_q

A and B give "can't pickle int objects" errors, while C doesn't initialize oth_q (because there is no information about it).

Question: How to preserve the information about which class reference is held?

(I need to hold the class reference to be able to make objects of that type - an alternate for this might work too)

1 Answers1

1

When loading dumped YAML, you normally don't need to preserve the information about which class needs to be instantiated. That is what tag information, stored in the file with !XObj, is for.

If you hide a reference to an object of a certain class, by not dumping the attribute that refers to it, and then run into problems instantiating that object (because you don't know its class) when loading, you are doing something wrong. In that case you should hide the internals of the referenced object, not the attribute that references the object. You could e.g. dump the referenced object using !XObj null.

By hiding the internals, you will have the appropriate tag, pointing to the right class to create an object from, when loading. You'll have to decide what your programs with the internals for that object, based on the limited null information.

Warning: you should seriously reconsider using yaml.YAMLObject in the way you do. You are using the, documented as unsafe, load() and if you cannot guarantee 100% control, now and at any time in the future, of your YAML input, you might lose the content of your drive, the secrecy of the objects you try to hide, or worse. You should be using safe_load() or move away from using a library like PyYAML, which defaults to being unsafe.

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • The thing is, I'm not holding an object reference, I'm holding a class reference. Are you suggesting I hold a null object of the class, and use its type information to generate new members on-the-fly? P.S. I'm aware of PyYAML's unsafety, I'm thinking of moving to Camel or something; would it make more sense to immediately move to a different yaml library instead of fixing the issues here? – Anatoly Makarevich Aug 14 '17 at 23:03
  • 1
    I missed the part that you actually only store a reference (maybe I am better at reading YAML than at reading other people's Python). I would still suggest trying to solve this with YAML tags facility, maybe by having an extra generator class for each of your referenced classes, for which an object gets dumped. If you haven't done so yet, please take a look at my PyYAML derivative [ruamel.yaml](http://yaml.readthedocs.io/en/latest/) – Anthon Aug 15 '17 at 04:15
  • Thanks for your suggestion. I was able to turn the type-reference into a property (as it was just the same in each instance of my particular class) in some of the cases. I also switched to your library after reading up a bit. It took a while to figure out how to make a representer (still need to figure out how to deseralize, but that's a different problem...). HOWEVER, I still need to hold some type-references in other objects, and I can't figure out how to tell ruamel.yaml how to serialize it. How can I register a representer for a type instead of objects of that type? – Anatoly Makarevich Aug 18 '17 at 11:25
  • I want to post some code - it probably makes sense to ask another question and mark this one as answered, right? I'll make a better example there. – Anatoly Makarevich Aug 18 '17 at 11:32
  • @AnatolyMakarevich Yes a new question is better. I get notified when things are tagged pyyaml/ruamel.yaml, but probably not be able to answer until early next week (assuming I know the answer), maybe someone else can before that time. – Anthon Aug 18 '17 at 14:40