0

I would like some help on dumping a custom class object to a YAML file. A representation of my class looks like below:

from classes.enum_classes import Priority, Deadline

class Test(yaml.YAMLObject):
    
    yaml_tag = u'!RandomTestClass'
 
    def __init__(self):
        
        self._name = ""
        self._request_id = ""
        self._order = None
        self._priority = Priority.medium
        self._deadline = Deadline.label_c

The last two parameters are objects of different classes, both of which are Enum derived classes. I am trying to dump the contents of an object of Test class to a YAML output file. My __repr__ method for the Test class looks so:

    def __repr__(self):
        return "(name=%r, request=%r, order=%r, priority=%r, deadline=%r)" % \
               ((str(self.name) + str(self.order)), self.request_id, self.order,
                self.priority.name, self._deadline.name)

Working off the constructors and representers section of the PyYAML documentation and the example provided there (especially considering that the authors use lists for some of the class variables), I would expect my YAML file to display the actual display tags in the __repr__ method, rather than the variable names themselves. This is what I see currently:

--- !ContainerClass
_requests: !!python/object/apply:collections.defaultdict
  args:
  - !!python/name:builtins.list ''
  dictitems:
    '108':
    - !RandomTestClass
      _deadline: &id002 !Deadline '3'
      _name: '108.1'
      _order: '1'
      _priority: &id006 !Priority '1'
      _request_id: '108'
    - !RandomTestClass
      _deadline: &id002 !Deadline '3'
      _name: '108.2'
      _order: '2'
      _priority: &id006 !Priority '1'
      _request_id: '108'
_name: TestContainer

I want it to look like so:

---
Requests:
    - name: '108.1'
    - requestID: '108'
    - priority: '1'
    - order: '1'
    - deadline: '3'
    - name: '108.2' <for the second entry in the list and so on>
Name: ContainerClass

No amount of fiddling around with the __repr__ method or anything else has resulted in the output I would like. So there are two issues I would love to get some help with.

  1. How do I get a sane, readable representation? I am guessing I will have to write a custom representer, so if someone could guide me with some pointers, since I was not able to find much information on that any way.

  2. Getting rid of those pointers, or whatever we would want to call them next to priority and deadline. Priority and Deadline are the two classes referenced in my __init___ method above, that are Enum subclasses. Since they are already subclasses, any attempts to subclass them to yaml.YAMLObject result in an error with mixins. To get around that, some posts suggested I do so:

class Priority(Enum):

    low = 0
    medium = 1
    high = 2


class Deadline(Enum):

    label_a = 0
    label_b = 1
    label_c = 2
    label_d = 3


def priority_enum_representer(dumper, data):
    return dumper.represent_scalar('!Priority', str(data.value))


def deadline_enum_representer(dumper, data):
    return dumper.represent_scalar('!Deadline', str(data.value))


yaml.add_representer(Deadline, deadline_enum_representer)
yaml.add_representer(Priority, priority_enum_representer)

Any information/pointers on solving these two issues will be much much appreciated. Apologies for the long post, but I have learnt that more information generally leads to much more precise help.

UPDATE:

My YAML file is written based on a list of these RandomTestClass objects that are stored in a defaultdict(list) in a ContainerClass.

class ContainerClass(yaml.YAMLObject):
    
    yaml_tag = u'ContainerClass'

    def __init__(self):
        
        self._name = ""
        self._requests = defaultdict(list)

    def __repr__(self):
        
        return "(Name=%r, Requests=%r)" % \
               (self._name, str(self._requests))

    @property
    def requests(self):
        return self._requests

    @requests.setter
    def requests(self, new_req, value=None):

        if type(new_req) is dict:
            self._requests = new_req
        else:
            try:
                self._requests[new_req].append(value)
            except AttributeError:
                # This means the key exists, but the value is a non-list
                # entity. Change existing value to list, append new value
                # and reassign to the same key
                list_with_values \
                    = [self._requests[new_req], value]
                self._requests[new_req] = list_with_values            

The ContainerClass holds instances of Test objects. In another class, which is the entry point for my code containing __main__, I create multiple instances of Test objects, that are then stored in an ```ContainerClass`` object and dumped out to the YAML file.

# requisite imports here, based on 
# how the files are stored

from classes.Test import Test
from classes.ContainerClass import ContainerClass

class RunTestClass:

if __name__ == '__main__':
    
    yaml_container = ContainerClass()

    test_object_a = Test()
    test_object_a._name = '108.1'
    test_object_a._order = 1
    test_object_a._request_id = '108'

    yaml_container._name = "TestContainer"
    yaml_container._requests[test_object_a._request_id] = test_object_a

    test_object_b = Test()
    test_object_b._name = '108.2'
    test_object_b._order = 2
    test_object_b._request_id = '108'

    yaml_container._name = "TestContainer"
    yaml_container._requests[test_object_b._request_id] = test_object_b

with open(output_file, mode='w+') as outfile:
    for test_class_object in yaml_container._requests:
        yaml.dump(test_class_object, outfile, default_flow_style=False,
                  explicit_start=True, explicit_end=True)

UPDATE:

Adding a single, consolidated file to the question, executable to replicate the issue.

import yaml
from enum import Enum
from collections import defaultdict


class Priority(Enum):

    low = 0
    medium = 1
    high = 2


class Deadline(Enum):

    label_a = 0
    label_b = 1
    label_c = 2
    label_d = 3


def priority_enum_representer(dumper, data):
    return dumper.represent_scalar('!Priority', str(data.value))


def deadline_enum_representer(dumper, data):
    return dumper.represent_scalar('!Deadline', str(data.value))


yaml.add_representer(Deadline, deadline_enum_representer)
yaml.add_representer(Priority, priority_enum_representer)


class Test(yaml.YAMLObject):
    
    yaml_tag = u'!RandomTestClass'
 
    def __init__(self):
        
        self._name = ""
        self._request_id = ""
        self._order = None
        self._priority = Priority.medium
        self._deadline = Deadline.label_c

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, name):
        self._name = name

    @property
    def request_id(self):
        return self._request_id

    @request_id.setter
    def request_id(self, r_id):
        self._request_id = r_id

    @property
    def order(self):
        return self._order

    @order.setter
    def order(self, order):
        self._order = order

    @property
    def priority(self):
        return self._priority

    @priority.setter
    def priority(self, priority):
        self._priority = priority

    @property
    def deadline(self):
        return self._deadline

    @deadline.setter
    def deadline(self, deadline):
        self._deadline = deadline

    def __str__(self):
        return self.name + ", " + self._request_id + ", " + str(self.order) + ", " \
               + str(self.priority) + ", " + str(self.deadline)


class ContainerClass(yaml.YAMLObject):
    
    yaml_tag = u'ContainerClass'

    def __init__(self):
        
        self._name = ""
        self._requests = defaultdict(list)

    def __repr__(self):
        
        return "(Name=%r, Requests=%r)" % \
               (self._name, str(self._requests))

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, name):
        self._name = name

    @property
    def requests(self):
        return self._requests

    def set_requests(self, new_req, value=None):

        if type(new_req) is dict:
            self._requests = new_req
        else:
            try:
                self._requests[new_req].append(value)
            except AttributeError:
                # This means the key exists, but the value is a non-list
                # entity. Change existing value to list, append new value
                # and reassign to the same key
                print("Encountered a single value, converting to a list and appending new value")
                list_with_values \
                    = [self._requests[new_req], value]
                self._requests[new_req] = list_with_values


yaml_container = ContainerClass()
yaml_container.name = "TestContainer"

test_object_a = Test()
test_object_a._name = '108.1'
test_object_a._order = 1
test_object_a._request_id = '108'
yaml_container.set_requests(test_object_a.request_id, test_object_a)

test_object_b = Test()
test_object_b._name = '108.2'
test_object_b._order = 2
test_object_b._request_id = '108'
yaml_container.set_requests(test_object_b.request_id, test_object_b)

with open('test.yaml', mode='w+') as outfile:
    yaml.dump(yaml_container, outfile, default_flow_style=False,
              explicit_start=True, explicit_end=True)

adwaraki
  • 342
  • 1
  • 5
  • 14
  • Please show the actual code to reproduce your output. The given YAML implies that you do not simply serialize an instance of `Test` but values containing it, and it is unclear how and why you want those surrounding values to vanish. In any case, `__repr__` is not used for serializing your class with YAML, the docs merely use it to show you a nice rendering of the value in the loading examples. – flyx Jan 21 '21 at 11:19
  • @flyx: That is my actual code, with just variables names masked, for the most part. I tried to follow the PyYAML documentation and examples, but I guess I am not sure what you mean by serializing an instance. I am just using ```yaml.dump()``` to do that. – adwaraki Jan 21 '21 at 15:01
  • @flyx: The output YAML file that I produce is being used my someone else's code that I cannot modify, hence the need to reproduce labels and such in the way I have shown. If it is not, I will end up breaking their code. – adwaraki Jan 21 '21 at 15:06
  • Please post a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), without references to undefined variables like `test_class_object`. The code you've shown does not explain where `!!python/object/apply:collections.defaultdict` comes from and generally does not seem to be the code used to produce the output, since it defines explicit start/end while your YAML contains neither. – flyx Jan 21 '21 at 15:50
  • @flyx: Hope that is enough. Might need to make some minor changes depending on the environment, but that is what I could replicate without giving away any actual code. I have left out the getters/setters in TestClass, which I guess are not necessary since you can get, set values as is. – adwaraki Jan 21 '21 at 16:43
  • I do have a general idea of how to tackle the problem. However, you do not supply a single chunk of code that I can copy to a Python file and execute it, and get exactly what you show as YAML output. This is what I expect so I can try out whether what I would suggest works. You should remove all code not relevant to the problem, e.g. imports between different files, checks on `__main__`, references to unknown classes `Priority` etc. I tried to piece together your code and make it run, but it did output `--- '108'
    ...` and that's far from what you show. Try to enable me run your code.
    – flyx Jan 21 '21 at 23:54
  • @flyx: Done. The file at the end executes and reproduces the output. I had made a mistake in the for loop in ```__main__```. Overlooked it. Also, not sure why you said Priority was an unknown class. I did define it in the question above (even before I consolidated it) and it is part of the YAML output. – adwaraki Jan 22 '21 at 08:22

1 Answers1

2

There are different ways to solve this. Generally, you want to define a classmethod to_yaml in ContainerClass that will be called by the dumper, and implement the transition from native structure to YAML node graph there.

A minimal solution would be to just construct the structure you want to have as normal dict with a list in it, and tell the dumper to use that instead of the real structure:

    @classmethod
    def to_yaml(cls, dumper, data):
        rSeq = []
        for value in data._requests.values():
            for t in value:
              rSeq.extend([
                {"name": t._name},
                {"requestID": t._request_id},
                {"priority": t._priority},
                {"order": t._order},
                {"deadline": t._deadline}
              ])
        return dumper.represent_mapping("tag:yaml.org,2002:map",
                {"Requests": rSeq, "Name": data._name})

This will give you

---
Name: TestContainer
Requests:
- name: '108.1'
- requestID: '108'
- priority: &id001 !Priority '1'
- order: 1
- deadline: &id002 !Deadline '2'
- name: '108.2'
- requestID: '108'
- priority: *id001
- order: 2
- deadline: *id002
...

YAML generates anchors & aliases for priority and requestID because the values refer the same objects. You can avoid those by doing

yaml.Dumper.ignore_aliases = lambda *args : True

before you dump the YAML.

You can be more sophisticated and iterate the properties instead of hard coding the names, but that is beyond the scope of this answer. If you want to load this YAML again into the same structure, you will need to add another classmethod from_yaml implementing the reverse transformation.

flyx
  • 35,506
  • 7
  • 89
  • 126
  • Thank you so much! That worked perfectly. I am going to play around with this to suit my actual objects that are a bit more complex than this example. Is there any reference you can point me to for the latter part that you mentioned (which was beyond the scope of this answer)? Much appreciated. – adwaraki Jan 22 '21 at 14:22
  • @adwaraki [This answer](https://stackoverflow.com/a/1215428/347964) enumerates all properties of a class, though you also want to get the properties besides the names (so, the result of the `getattr` call). On those properties, you can then call `fget(data)` to execute the getter on the class instance. – flyx Jan 22 '21 at 16:27
  • +1 for the dumper suggestion. I would say traditionally the to_yaml be an object function and a from yaml be the classmethod, but because of the static nature, you could even make it a static function. Great answer! – Jon Feb 08 '22 at 14:47