2

I'm using solution in the related answer for How to auto-dump modified values in nested dictionaries using ruamel.yaml, which uses the default (round-trip) loader/dumper.

I believe, it is a hard problem since additinal dict should keep comments as well.

=> Would it be possible to prevent comments to be removed if we use modify values in nested dictionaries using ruamel.yaml approach?


Example:

config.yaml:

c:  # my comment
  b:  
   f: 5  
a:
  z: 4
  b: 4  # my comment

code (same code from How to auto-dump modified values in nested dictionaries using ruamel.yaml ), which changed to use the default (round-trip) loader/dumper:

#!/usr/bin/env python3

from pathlib import Path
from ruamel.yaml import YAML, representer


class SubConfig(dict):
    def __init__(self, parent):
        self.parent = parent

    def updated(self):
        self.parent.updated()

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
            return super().__getitem__(key)
        return res

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
        for k, v in kw.items():
            self[k] = v
        self.updated()
        return


_SR = representer.RoundTripRepresenter
_SR.add_representer(SubConfig, _SR.represent_dict)


class Config(dict):
    def __init__(self, filename, auto_dump=True):
        self.filename = filename if hasattr(filename, "open") else Path(filename)
        self.auto_dump = auto_dump
        self.changed = False
        self.yaml = YAML()
        self.yaml.indent(mapping=4, sequence=4, offset=2)
        self.yaml.default_flow_style = False
        if self.filename.exists():
            with open(filename) as f:
                self.update(self.yaml.load(f) or {})

    def updated(self):
        if self.auto_dump:
            self.dump(force=True)
        else:
            self.changed = True

    def dump(self, force=False):
        if not self.changed and not force:
            return
        with open(self.filename, "w") as f:
            self.yaml.dump(dict(self), f)
        self.changed = False

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
        return super().__getitem__(key)

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
        for k, v in kw.items():
            self[k] = v
        self.updated()


cfg = Config(Path("config.yaml"))

=> config.yaml file is updated as follows, where its comments are removed:

c:
    b:
        f: 5
a:
    z: 4
    b: 4
alper
  • 2,919
  • 9
  • 53
  • 102

1 Answers1

1

Yes, it is possible to prevent comments from being lost. The object returned from self.yaml.load() in your Config.__init__() method is not a dict, but a subclass thereof (ruamel.yaml.comments.CommentedMap) that includes all of the comment information (in its .ca attribute. And that CommentedMap will have values that are in itself again CommentedMap instances (at least with your input.

So what you need to do is change your classes:

class Config(ruamel.yaml.comments.CommentedMap):

and do the same for SubConfig. Then during the update routine you should try to copy the .ca attribute (it will be created empty on the CommentedMap, but not be available on {})

Make sure you all add a the representer for Config, and don't cast to dict in your Config.dump() method.


If you also copy the .fa attribute of the loaded data (also on Subconfig), you'll preserve the flow/block style of the original, and you can do a away with the self.yaml.default_flow_style = False.


The above is the theory, in practise there are a few more issues.

Your config.yaml changes, although you do not explicitly dump. That is because your auto_dump is True by default. But that also means it dumps for every change, i.e. your config.yaml gets dumped 10 (ten) times while the Config/SubConfig data structure gets build. I changed this to dump only once if auto_dump is True, but even that I would not recommend, instead only dump if changed after loading.

A dict doesn't need initializing, but a CommentedMap does. If you don't you get an attribute error at some point. So you'll have to call super().__init__(self) in each the __init__ of each class.

from pathlib import Path
import ruamel.yaml

_SR = ruamel.yaml.representer.RoundTripRepresenter

class SubConfig(ruamel.yaml.comments.CommentedMap):
    def __init__(self, parent):
        self.parent = parent
        super().__init__(self)

    def updated(self):
        self.parent.updated()

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
            return super().__getitem__(key)
        return res

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
            for attr in [ruamel.yaml.comments.Comment.attrib, ruamel.yaml.comments.Format.attrib]:
                if hasattr(arg, attr):
                    setattr(self, attr, getattr(arg, attr))
        for k, v in kw.items():
            self[k] = v
        self.updated()
        return


_SR.add_representer(SubConfig, _SR.represent_dict)


class Config(ruamel.yaml.comments.CommentedMap):
    def __init__(self, filename, auto_dump=True):
        super().__init__(self)
        self.filename = filename if hasattr(filename, "open") else Path(filename)
        self.auto_dump = False  # postpone setting during loading of config
        self.changed = False
        self.yaml = ruamel.yaml.YAML()
        self.yaml.indent(mapping=4, sequence=4, offset=2)
        # self.yaml.default_flow_style = False
        if self.filename.exists():
            with open(filename) as f:
                self.update(self.yaml.load(f) or {})
        self.auto_dump = auto_dump
        if auto_dump:
            self.dump()

    def updated(self):
        if self.auto_dump:
            self.dump(force=True)
        else:
            self.changed = True

    def dump(self, force=False):
        if not self.changed and not force:
            return
        # use the capability of dump to take a Path. It will open the file 'wb' as
        # is appropriate for a YAML file, which is UTF-8
        self.yaml.dump(self, self.filename)
        self.changed = False

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
        return super().__getitem__(key)

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
            for attr in [ruamel.yaml.comments.Comment.attrib, ruamel.yaml.comments.Format.attrib]:
                if hasattr(arg, attr):
                    setattr(self, attr, getattr(arg, attr))
        for k, v in kw.items():
            self[k] = v
        self.updated()

_SR.add_representer(Config, _SR.represent_dict)

fn = Path('config.yaml')
fn.write_text("""
c:  # my comment
  b:
     f: 5
  x: {g: 6}
a:
  z: 4
  b: 4  # my comment
""")
cfg = Config(fn)
print(Path(fn).read_text())

which gives:

c:  # my comment
    b:
        f: 5
    x: {g: 6}
a:
    z: 4
    b: 4 # my comment

Since the input changes, I write the config file to test out on every run. I also added a flow style dict, to make clear that original formatting is performed.

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • After switching into `ruamel.yaml.comments.CommentedMap`, it gives error at `self._ok.add(key)` line where it gives error at: `ruamel/yaml/comments.py", line 936, in __setitem__ AttributeError: _ok` – alper Feb 05 '22 at 21:16
  • That is because you don't initialize the `CommentedMap`. I updated my answer. – Anthon Feb 06 '22 at 07:10
  • Thanks for the complete answer that covers all – alper Feb 06 '22 at 10:41
  • When `c: # my comment` written as ` c: # my comment` (having 4 space before ) , its comment also have 4 space instead of 2, is it a normal behavior? – alper Feb 06 '22 at 11:19
  • 1
    The comments only "know" in what column they started and `ruamel.yaml` tries to put them back in that column, so normally you neatly aligned comments stay aligned, If that column is no longer available (because of re-indentation or longer YAML data on the line before it, the comment gets pushed back on the line. A comment doesn't have an idea of distance. But I don't guarantee that for comments between keys and their values, and what is done there might change in the future. – Anthon Feb 06 '22 at 12:50