1

When I try following solution PyYAML - Saving data to .yaml files and try to modify values in nested dictionaries using ruamel.yaml

cfg = Config("test.yaml")
cfg['setup']['a'] = 3 
print(cfg)  # I can see the change for the `dict` but it is not saved

cfg['setup']['a'] value is changed but it is not caught by the __setitem__() and not saved using updated() function.

Would it be possible to auto-dump any modified change for values in nested dict?

ex:

  • dict[in_key][out_key] = value
  • cfg['setup']['a'][b]['c'] = 3

PyYAML - Saving data to .yaml files:


class Config(dict):
    def __init__(self, filename, auto_dump=True):
        self.filename = filename
        self.auto_dump = auto_dump
        self.changed = False
        self.yaml = YAML()
        self.yaml.preserve_quotes = True
        if os.path.isfile(filename):
            with open(filename) as f:
                super(Config, self).update(self.yaml.load(f) or {})

    def dump(self, force=False):
        if not self.changed and not force:
            return
        with open(self.filename, "w") as f:
            self.yaml.dump(dict(self), f)
        self.changed = False

    def updated(self):
        if self.auto_dump:
            self.dump(force=True)
        else:
            self.changed = True

    def __setitem__(self, key, value):
        super(Config, self).__setitem__(key, value)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            super(Config, self).update(arg)
        super(Config, self).update(**kw)
        self.updated()

Related:

alper
  • 2,919
  • 9
  • 53
  • 102

1 Answers1

1

You will need to make a secondary class SubConfig that behaves similar to Config. It is probably a good idea to get rid of the old style super(Config, self) before that.

Change __setitem__ to check that the value is a dict, and if so instantiate SubConfig and then setting the individual items (the SubConfig needs to do that as well, so you can have arbitrary nesting).

The SubConfig, on __init__, doesn't take a filename, but it takes a parent (of type Config or SubConfig). Subconfig itself shouldn't dump, and its updated should call the parents updated (eventually bubbling up to Config that then does a save).

In order to support doing cfg['a'] = dict(c=1) you need to implement __getitem__, and similar for del cfg['a'] implement __delitem__, to make it write the updated file.

I thought you could subclass one file fromt the other as several methods are the same, but couldn't get this to work with super() properly.

If you ever assign lists to (nested) keys, and want to autodump on updating an element in such a list you'll need to implement some SubConfigList and handle those in __setitem__

import sys
import os
from pathlib import Path
import ruamel.yaml

class SubConfig(dict):
    def __init__(self, parent):
        self.parent = parent

    def updated(self):
        self.parent.updated()

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
            return super().__getitem__(key)
        return res

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
        for k, v in kw.items():
            self[k] = v
        self.updated()
        return

_SR = ruamel.yaml.representer.SafeRepresenter
_SR.add_representer(SubConfig, _SR.represent_dict)

class Config(dict):
    def __init__(self, filename, auto_dump=True):
        self.filename = filename if hasattr(filename, 'open') else Path(filename)
        self.auto_dump = auto_dump
        self.changed = False
        self.yaml = ruamel.yaml.YAML(typ='safe')
        self.yaml.default_flow_style = False
        if self.filename.exists():
            with open(filename) as f:
                self.update(self.yaml.load(f) or {})

    def updated(self):
        if self.auto_dump:
            self.dump(force=True)
        else:
            self.changed = True

    def dump(self, force=False):
        if not self.changed and not force:
            return
        with open(self.filename, "w") as f:
            self.yaml.dump(dict(self), f)
        self.changed = False

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            v = SubConfig(self)
            v.update(value)
            value = v
        super().__setitem__(key, value)
        self.updated()

    def __getitem__(self, key):
        try:
            res = super().__getitem__(key)
        except KeyError:
            super().__setitem__(key, SubConfig(self))
            self.updated()
        return super().__getitem__(key)

    def __delitem__(self, key):
        res = super().__delitem__(key)
        self.updated()

    def update(self, *args, **kw):
        for arg in args:
            for k, v in arg.items():
                self[k] = v
        for k, v in kw.items():
            self[k] = v
        self.updated()

config_file = Path('config.yaml') 

cfg = Config(config_file)
cfg['a'] = 1
cfg['b']['x'] = 2
cfg['c']['y']['z'] = 42

print(f'{config_file} 1:')
print(config_file.read_text())

cfg['b']['x'] = 3
cfg['a'] = 4

print(f'{config_file} 2:')
print(config_file.read_text())

cfg.update(a=9, d=196)
cfg['c']['y'].update(k=11, l=12)

print(f'{config_file} 3:')
print(config_file.read_text())
        
# reread config from file
cfg = Config(config_file)
assert isinstance(cfg['c']['y'], SubConfig)
assert cfg['c']['y']['z'] == 42
del cfg['c']
print(f'{config_file} 4:')
print(config_file.read_text())


# start from scratch immediately use updating
config_file.unlink()
cfg = Config(config_file)
cfg.update(a=dict(b=4))
cfg.update(c=dict(b=dict(e=5)))
assert isinstance(cfg['a'], SubConfig)
assert isinstance(cfg['c']['b'], SubConfig)
cfg['c']['b']['f'] = 22

print(f'{config_file} 5:')
print(config_file.read_text())

which gives:

config.yaml 1:
a: 1
b:
  x: 2
c:
  y:
    z: 42

config.yaml 2:
a: 4
b:
  x: 3
c:
  y:
    z: 42

config.yaml 3:
a: 9
b:
  x: 3
c:
  y:
    k: 11
    l: 12
    z: 42
d: 196

config.yaml 4:
a: 9
b:
  x: 3
d: 196

config.yaml 5:
a:
  b: 4
c:
  b:
    e: 5
    f: 22

You should consider not making these classes a subclass of dict, but have the dict as an attribute ._d (and replace super(). with self._d.). This would require a specific representer function/method.

The advantage of that is that you don't get some dict functionality unexpectedly. E.g. in the above subclassing implementation, if I hadn't implemented __delitem__, you could still do del cfg['c'] without an error, but the YAML file would not be written automatically. If the dict is an attribute, you'll get an error until you implement __delitem__.

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • What should be arguments for `__setitem__(self, ...)`? – alper Aug 07 '21 at 19:17
  • `__setitem__(self, key, value)` both for `Config` and `SubConfig` – Anthon Aug 08 '21 at 05:37
  • Sorry, I get confused there will be multiple keys `*key`, it does not end up in `__setitem__(self, key, value)` function – alper Aug 08 '21 at 09:55
  • I don't see any `*key` in your code (or mine). – Anthon Aug 08 '21 at 11:36
  • Like for `cfg['setup']['a'] = 3` there are two keys first one is `setup` and second one is `a`. But `__setitem__(self, key, value)` requires single `key` hence it does not enter into that function. It could be `cfg['setup']['a'][`b`]['c'] = 3`, which has 4 key values. – alper Aug 08 '21 at 12:37
  • I think the call should be something like: `cfg['setup', 'a', `b`, 'c'] = 3` – alper Aug 08 '21 at 12:48
  • No. You get one key when calling `Config.__setitem__` and that returns a `SubConfig` and its `__setitem__` is called with another key. – Anthon Aug 08 '21 at 14:11
  • Should `SubConfig` be base clase where `Config(SubConfig)` – alper Aug 08 '21 at 14:16
  • You could, since several methods are exactly the same, You could do it the other way around (`class Subconfig(Config):`) but since you probably want to add a representer for `SubConfig`, but not for `Config` that might not work out. – Anthon Aug 08 '21 at 20:47
  • Thanks, should I also make the setting operations using `update()` method? – alper Aug 08 '21 at 21:59
  • 1
    You need the `update()` to walk over the key/value pairs so it does the right thing when the value is a dict. I made the answer somewhat more extensive, so have a look at that. Some more testing might be needed. – Anthon Aug 09 '21 at 08:27
  • Thanks. Works like magic. If I see somehing goes wrong, I will let you know. Why do we need `_SR`? How does it help? – alper Aug 09 '21 at 09:24
  • _SR is just to make the next line shorter. You need to register `SubConfig` to dump like a dict, so that YAML knows how to dump it (Config doesn't need to register, as I do `dict(self)` – Anthon Aug 09 '21 at 13:14
  • During the write operation if accidently I terminate the process it may end up writing into file setting all keys' value as empty (`{}`). Would it be possible to prevent this? – alper Aug 16 '21 at 11:13
  • Why don't you write to a temporary file, and only when that completes successfully, unlink the real filename and rename the temporary to the real? – Anthon Aug 16 '21 at 15:03
  • Ah that's smart simple solution, writing into `config_temp.yaml` and that into original file. By saying `unlink the real filename` , just doing `mv config_temp.yaml config.yaml` right? – alper Aug 16 '21 at 18:20
  • if you do it in python there is `os.rename()` but it throws an error if the target name exists. So you should use `os.unlink('config.yaml')` or `os.remove('config.yaml')` first. – Anthon Aug 16 '21 at 21:20
  • Also when don't need assignment into `res` right, since res is not used? `res = super().__getitem__(key)` could be just `super().__getitem__(key)` – alper Aug 17 '21 at 23:22
  • I think I wanted to use res to return if the exception was not thrown to prevent double lookup in that case. – Anthon Aug 18 '21 at 04:48
  • I just realize your solution does not save the comments is it normal? other than that it works solid – alper Sep 06 '21 at 11:25
  • Yes that is normal if you do `YAML(typ='safe')` you get the fast C based loader. that doesn't preserve comments. – Anthon Sep 06 '21 at 14:21
  • Would be OK if I use not safe version in order to keep the comments? Is there any risk for it? – alper Sep 06 '21 at 15:21
  • 1
    The default is the round-trip-loader, which is a subclass of the safeloader, and of the (unsafe) Loader. The round-trip-loader will not instantiate any, potentially harmful, classes based on tag (instead it makes commentedmap/seq instances and sets their tag). – Anthon Sep 06 '21 at 19:29
  • I have changed `ruamel.yaml.YAML(typ='safe')` into `ruamel.yaml.YAML()` but now facing with following error: `raise RepresenterError(_F('cannot represent an object: {data!s}', data=data))`. For this approach as I understand I have to stick with `safe` – alper Sep 07 '21 at 09:02
  • When multiple processes read/write on the same `yaml` file; `os.rename()` may cause a problem where one of the may rename the temporary file at the same time while other is working on it. Do you advice any lock mechanisim that I can use along with `ruamel.yaml`? If required I can as a new question related to this? @Anthon – alper Sep 18 '21 at 06:39
  • No preference, but you would need to consider lock/re-load/modify/write and make sure you catch any setting of something already set by another process. – Anthon Sep 18 '21 at 13:06
  • Yes that is possible. And **additional** questions are not what comments are for – Anthon Sep 29 '21 at 05:34
  • @alper Not sure what you try to edit, but it was unacceptable to change all of the code lines as indicated by SO. Please don't do that. Someone else had already rejected your edit, and so did I. – Anthon Sep 29 '21 at 12:10
  • @alper You don't have to comment, I get notified automatically if a question tagged [ruamel.yaml] is posted. – Anthon Nov 19 '21 at 11:47