2

I want to use Python to read and write YAML frontmatter in markdown files. I have come across the ruamel.yaml package but am having trouble understanding how to use it for this purpose.

If I have a markdown file:

---
car: 
  make: Toyota
  model: Camry
---

# My Ultimate Car Review
This is a good car.

For one, is there a way to set the yaml data to variables in my python code?

Second, is there a way to set new values to the yaml in the markdown file?

For the first, I have tried:

from ruamel.yaml import YAML
import sys

f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.

yaml = YAML()
code = yaml.load(f)
print(code['car']['make'])

but get an error:

ruamel.yaml.composer.ComposerError: expected a single document in the stream
  in "cars.txt", line 2, column 1
but found another document
  in "cars.txt", line 5, column 1

For the second, I have tried:

from ruamel.yaml import YAML
import sys

f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.

yaml = YAML()
code = yaml.load(f)
code['car']['model'] = 'Sequoia'

but get the same error error:

ruamel.yaml.composer.ComposerError: expected a single document in the stream
  in "cars.txt", line 2, column 1
but found another document
  in "cars.txt", line 5, column 1
Anthon
  • 69,918
  • 32
  • 186
  • 246
pruppert
  • 111
  • 1
  • 12
  • The author of ruamel.yaml indicates [here](https://stackoverflow.com/a/40359935/1611844) that his/her package can be used on files with yaml frontmatter. However, the example given is too complex for me to parse. I'm hoping the package developer or someone else can give a simpler example. – pruppert Jun 14 '21 at 03:18
  • 1
    this [question](https://stackoverflow.com/questions/14359557/reading-yaml-file-with-python-results-in-yaml-composer-composererror-expected-a) will help , you have to two documents do you have to use `load_all` – gaurav Jun 14 '21 at 03:51

1 Answers1

2

When you have multiple YAML documents in one file these are separated with a line consisting of three dashes, or starting with three dashes followed by a space. Most YAML parsers, including ruamel.yaml either expect a single document file (when using YAML().load()) or a multi-document file (when using YAML().load_all()).

The method .load() returns the single data structure, and complains if there seems to be more than one document (i.e. when it encounters the second --- in your file). The .load_all() method can handle one or more YAML documents, but always returns an iterator.

Your input happens to be a valid multi-document YAML file but the markdown part often makes this not be the case. It easily could always have been valid YAML by just changing the second --- into --- | thereby making the markdown part a (multi-line) literal scalar string. I have no idea why the designers of such YAML frontmatter formats didn't specify that, it might have to do that some parsers (like PyYAML) fail to parse such non-indented literal scalar strings at the root level correctly, although examples of those are in the YAML specification.

In your example the markdown part is so simple that it is valid YAML without having to specify the | for literal scalar string. So you could use .load_all() on this input. But just adding e.g. a line starting with a dash to the markdown section, will result in an invalid YAML document, so you if you use .load_all(), you have to make sure you do not iterate so far as to parse the second document:

import sys
from pathlib import Path
import ruamel.yaml

path = Path('cars.txt')

yaml = ruamel.yaml.YAML()
for data in yaml.load_all(path):
    break
print(data['car']['make'])

which gives:

Toyota

You shouldn't try to update the file however (so don't use r+), as your YAML frontmatter might be longer than the original and and updating would overwrite your markdown. For updating, read file into memory, split into two parts based on the second line of dashes, update the data, dump it and append the dashes and markdown:

import sys
from pathlib import Path
import ruamel.yaml

path = Path('cars.txt')
opath = Path('cars_out.txt')
yaml_str, markdown = path.read_text().lstrip().split('\n---', 1)
yaml_str += '\n' # re-add the trailing newline that was split off

yaml = ruamel.yaml.YAML()
yaml.explicit_start = True
data = yaml.load(yaml_str)

data['car']['year'] = 2003

with opath.open('w') as fp:
    yaml.dump(data, fp)
    fp.write('---')
    fp.write(markdown)

sys.stdout.write(opath.read_text())

which gives:

---
car:
  make: Toyota
  model: Camry
  year: 2003
---

# My Ultimate Car Review
This is a good car.
Anthon
  • 69,918
  • 32
  • 186
  • 246
  • 1
    BTW it is entirely possible to tweak the parser so that it will automatically insert the token for literal scalar after the second `---`. But since both `---` and `--- |` could occur in the markdown, resulting in more than two documents, it would be cumbersome to try and write such markdown part back from parsed YAML. – Anthon Jun 14 '21 at 07:35
  • Thank you for the detailed answer. This works. I had to insert `fp.write('---\n')` to line before `yaml.dump(data, fp)` to preserve the initial `---` in the output file. I plan to write the output directly back to the input file. It seems to work in this case, but any gotchas to be mindful of when doing this? Thanks! – pruppert Jun 14 '21 at 11:01
  • 1
    I forgot about the initial document start indicator, ruamel.yaml doesn't preserve that (it is superfluous in a single doc YAML file). I updated the answer using the `.explicit_start` attribute. You can do `opath = path` without a problem. Everyhing is read into memory, and you can overwrite the output. I don't do that during testing, because I would have to reconstruct the input file every time. – Anthon Jun 14 '21 at 11:38
  • 1
    If you have a huge file, it is faster/better to use `read_bytes` and `.open('wb')` and adjust the rest to use bytes instead of converting to/from Unicode. – Anthon Jun 14 '21 at 11:41