76

I have a yaml file that looks like this:

# The following key opens a door
key: value

Is there a way I can load and dump this data while maintaining the comment?

Harley Holcombe
  • 175,848
  • 15
  • 70
  • 63
  • 1
    I once modified the C libyaml code to emit comments for my own use. Extending this to PyYAML is not going to be easy. – David Heffernan Aug 31 '11 at 16:37
  • I thought about this again. Does it make sense to parse and write a yaml file which was edited by hand (and will be edited by hand in the future)? Why not split the file into two parts: One is handcrafted and the other part is pure data (without comments). Related: https://github.com/guettli/programming-guidelines/blob/master/README.rst#source-code-generation-is-a-stupid-idea – guettli Sep 16 '19 at 08:02

4 Answers4

124

If you are using block structured YAML, you can use the python package¹ ruamel.yaml which is a derivative of PyYAML and supports round trip preservation of comments:

import sys
import ruamel.yaml

yaml_str = """\
# example
name:
  # details
  family: Smith   # very common
  given: Alice    # one of the siblings
"""

yaml = ruamel.yaml.YAML()  # defaults to round-trip if no parameters given
code = yaml.load(yaml_str)
code['name']['given'] = 'Bob'

yaml.dump(code, sys.stdout)

with result:

# example
name:
  # details
  family: Smith   # very common
  given: Bob      # one of the siblings

Note that the end-of-line comments are still aligned.

Instead of normal list and dict objects the code consists of wrapped versions² on which the comments attached.

¹ Install with pip install ruamel.yaml. Works on Python 2.6/2.7/3.3+
² ordereddict is used in case of a mapping, to preserve ordering

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • This does not answer OP's question. It preserves order, but not comments. – Cerin Jan 22 '17 at 22:58
  • 12
    @cerin Which comments are missing when you run the above code? With what version of Python, ruamel.yaml and on which platform did you run this code? I just retried this with the latest version of ruamel.yaml (in case I broke something) and the output still includes comments. Given the amount of upvotes here, I think others have been able to get the same result, and that you might have overlooked something. – Anthon Jan 23 '17 at 06:35
  • 1
    @Anthon, It seems to be inconsistent. In your example, the comments are preserved, but in more complicated yaml files I've tested, it strips out some comments, especially if you edit data near those comments. I'm using the most recent version of the package with Python 2.7. – Cerin Jan 24 '17 at 15:22
  • 12
    @cerin The round tripping was originally there to update values in a configuration file, that should always work. The way things are "preserved", if you start deleting keys then comments might disappear. I would prefer if you asked a question about that on [so] or filed a bug report. I tend to fix things if I can and try to give workarounds, or at least an explanation if I can't. – Anthon Jan 24 '17 at 19:27
  • 1
    If someone wants more standard YAML formatting, you can use `yaml.indent(mapping=2, sequence=4, offset=2)` before doing `dump` – The Godfather Apr 01 '20 at 09:34
33

PyYAML throws away comments at a very low level (in Scanner.scan_to_next_token).

While you could adapt or extend it to handle comments in its whole stack, this would be a major modification. Dumping (=emitting) comments seems to be easier and was discussed in ticket 114 on the old PyYAML bug tracker.

As of 2023, the feature request about adding support for loading comments is still stalling.

dpr
  • 10,591
  • 3
  • 41
  • 71
phihag
  • 278,196
  • 72
  • 453
  • 469
  • 2
    You are right it was a major modification, although easier because I dropped < 2.6 support and recombined the Py2 and Py3 sources. – Anthon Nov 24 '14 at 11:00
5

I have a branch of pyyaml that does exactly this. https://github.com/pflarr/pyyaml

To build a yaml file with comments, you have to create an event stream that includes comment events. Comments are currently only allowed before sequence items and mapping keys.

This only currently works for python3, I haven't ported it to the python2 version of the library, but could easily do so on request. Additionally, this should also be fairly easy to port to the C libyaml as well, as the python code is a simple port of that anyway.

Paul Ferrell
  • 51
  • 1
  • 1
1

If you are not constrained by a file schema, you can pick a specific key pattern to mean "ignored entry". For example - your yaml-data ingestion logic can filter out any entry with a key starting with '~' :

company:
  ~name: Must be the legal name
  name: Curious Adventures
  ~address: the official correspondence address
  address: 1234, New York, PO 1234

I have used this approach for JSON files as there we have the same issue with comments