2

I have this example YAML file:

---
test:
  name: "Tom"
  age: "5"
  version: "1.0"

How can I replace this YAML file to this:


test:
  name: "Max"
  age: "10"
  version: "2.2"

This is the way I open the file:

import yaml

with open("config.yml", 'r') as stream:
        print(yaml.load(stream))

But I have no idea, how to edit the YAML file now.

Anthon
  • 69,918
  • 32
  • 186
  • 246
L.Kersting
  • 33
  • 1
  • 1
  • 5

2 Answers2

7

Given the fact that you use PyYaml, the appropriate way to do this is like this:

#!/usr/bin/env python

import yaml

with open("testfile.yaml", 'r') as stream:
    try:
        loaded = yaml.load(stream)
    except yaml.YAMLError as exc:
        print(exc)

# Modify the fields from the dict
loaded['test']['name'] = "Max"
loaded['test']['age'] = "10"
loaded['test']['version'] = "2.2"

# Save it again
with open("modified.yaml", 'w') as stream:
    try:
        yaml.dump(loaded, stream, default_flow_style=False)
    except yaml.YAMLError as exc:
        print(exc)

So you just load the yaml into a dict called loaded, you modify the values you need then you save it (overwriting the original file or not, your call). For a nested input you'd have a nested dict you'd have to modify. The default_flow_style=False parameter is necessary to produce the format you want (flow style), otherwise for nested collections it produces block style:

A: a
B: {C: c, D: d, E: e}

Cheers!

Later edit:

As Anthon pointed out, my answer has some flaws.

  • It's better to use safe_load instead of load since the later is potentially dangerous.

  • The output needs an directive end indicator (those three dashes at the beginning). To append them, we use explicit_start=True in the dump method (that should actually be safe_dump).

  • Use maybe ruamel.yaml instead of yaml, if you want to generate a better output (although they are semantically the same)

See Anthon's answer for a more detailed information, since he's the author of the package.

Adrian Pop
  • 1,879
  • 5
  • 28
  • 40
  • You should not be using PyYAML's `load`, it is potentially unsafe. The format the OP wants is called block style not flow-style, you have those two terms confused. Also if you don't specify `default_flow_style` only the leaf nodes would be affected **not** all nested collections. Your output doesn't have the leading `directives end indicator`. -1 for the aforementioned errors in your answer, +1 for at least using the recommended extension for YAML files. – Anthon Feb 21 '19 at 10:24
  • 1
    @Anthon Thanks for poiting those out. Is it okay to modify my answer and add some info about `ruamel.yaml` and `safe_load`? Also, I did not know about `explicit_start=True` parameter, I just found it in the documentation. *Edit*: I just saw you posted a more detailed answer. – Adrian Pop Feb 21 '19 at 10:38
  • Sure you can use that, it is probably better to update the answer to at least have the `safe_load` in there (for anyone using copy-and-paste), but fixing the other things would be good as well. My preference is not to use sections like `Later edit`, just try to keep a single coherent whole. The edit history is there in case anyone wants to know "how it used to be before improvements" – Anthon Feb 21 '19 at 11:58
5

If you read through the PyYAML documentation, you will see that it tells you that using the load() function is potentially dangerous, so the first thing to do (since you, and almost everybody else don't need it), is not using that, but using safe_load() instead.

You should also change your input file to config.yaml, the recommended extension for YAML files has been .yaml since at 2006.

Knowing that, the way to change a config.yaml file using PyYAML:

import yaml

with open('config.yaml') as stream:
   data = yaml.safe_load(stream)

test = data['test']
test.update(dict(name="Tom", age="10", version="2.2"))

with open('output.yaml', 'wb') as stream:
   yaml.safe_dump(data, stream, default_flow_style=False, 
                  explicit_start=True, allow_unicode=True, encoding='utf-8')

This will get you an output.yaml that looks like:

---
test:
  age: '10'
  name: Tom
  version: '2.2'

The default_flow_style parameter is necessary to not get a JSON like structure for your leaf node mapping. The explicit_start to get the leading diretives end indicator (---), and I recommend to always use allow_unicode=True, encoding='utf-8' (and open the file as binary) in order not to run into surprises or problems when you change name to Björk Guðmundsdóttir.

Now as you will notice, this doesn't generate the output that you want (although semantically the same):

  • single quotes instead of double quotes around the strings that could be interpreted as numbers
  • no double quotes around Tom
  • sorting of the keys of your mapping

If you had any comments in the YAML file, these would have been lost.

The better way to update YAML files is using ruamel.yaml (disclaimer: I am the author of that package), which has some more sany defaults than PyYAML, handles YAML 1.2 and doesn't drop comments (if you would have them in your file):

import ruamel.yaml


yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.explicit_start = True

with open('config.yaml') as stream:
   data = yaml.load(stream)

test = data['test']
test.update(dict(name="Tom", age="10", version="2.2"))

with open('output.yaml', 'wb') as stream:
    yaml.dump(data, stream)

with that your output file will be:

---
test:
  name: "Tom"
  age: "10"
  version: "2.2"

which is exactly what you wanted.

Anthon
  • 69,918
  • 32
  • 186
  • 246