Most YAML parsers are build for reading YAML, either written by other programs or edited by humans, and for writing YAML to be read by other programs. What is notoriously lacking is the ability of parsers to write YAML that is still readable by humans:
- the order of mapping keys is undefined
- comments get thrown away
- the scalar literal block style, if any, is dropped
- spacing around scalars is discarded
- the scalar folding information, if any, is dropped
The loading of a dump of a loaded handcrafted YAML file will result in the same internal data structures as the intial load, but the intermediate dump doesn't normally look like the original (handcrafted) YAML.
If you have a Python program:
import ruamel.yaml as yaml
yaml_str = """\
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
"""
data1 = yaml.load(yaml_str, Loader=yaml.Loader)
dump_str = yaml.dump(data1, Dumper=yaml.Dumper)
data2 = yaml.load(dump_str, Loader=yaml.Loader)
Then the following assertions hold:
assert data1 == data2
assert dump_str != yaml_str
The intermediate dump_str
looks like:
bill-to: &id001 {city: East Centerville, state: KS, street: '123 Tornado Alley
Suite 16
'}
customer: {given: Dorothy}
date: 2007-08-06
items:
- {descrip: Water Bucket (Filled), part_no: A4786}
- {descrip: High Heeled "Ruby" Slippers, part_no: E1628, size: 8}
receipt: Oz-Ware Purchase Invoice
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.
'
The above is the default behaviour for ruamel.yaml, PyYAML and for many YAML parsers in other language and online YAML conversion services. For some parsers this is the only behaviour provided.
The reason for me to start ruamel.yaml as an enhancement of PyYAML was to make going from handcrafted YAML to internal data, to YAML, result in something that is better human readable (what I call round-tripping), and preserves more information (especially comments).
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
print yaml.dump(data, Dumper=yaml.RoundTripDumper)
gives you:
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.
'
My focus has been on comments, key, order and literal block style. Spacing around scalars and folded scalars are not (yet) special.
Starting from there (you could also do this in PyYAML, but you would not have the built-in enhancements of ruamel.yaml key order keeping) you can either provide special emitters, or hook into the system at a lower level, overriding some methods in emitter.py
(and making sure you can call the
originals for the cases you don't need to handle:
def rewrite_write_plain(self, text, split=True):
if self.state == self.expect_block_mapping_simple_value:
text = '###' + text + '###'
while self.column < 20:
text = ' ' + text
self.column += 1
self._org_write_plain(text, split)
def rewrite_write_literal(self, text):
if self.state == self.expect_block_mapping_simple_value:
last_nl = False
if text and text[-1] == '\n':
last_nl = True
text = text[:-1]
text = '###' + text + '###'
if False:
extra_indent = ''
while self.column < 15:
text = ' ' + text
extra_indent += ' '
self.column += 1
text = text.replace('\n', '\n' + extra_indent)
if last_nl:
text += '\n'
self._org_write_literal(text)
def rewrite_write_single_quoted(self, text, split=True):
if self.state == self.expect_block_mapping_simple_value:
last_nl = False
if text and text[-1] == u'\n':
last_nl = True
text = text[:-1]
text = u'###' + text + u'###'
if last_nl:
text += u'\n'
self.write_folded(text)
def rewrite_write_indicator(self, indicator, need_whitespace,
whitespace=False, indention=False):
if indicator and indicator[0] in u"*&":
indicator = u'###' + indicator + u'###'
while self.column < 20:
indicator = ' ' + indicator
self.column += 1
self._org_write_indicator(indicator, need_whitespace, whitespace,
indention)
dumper._org_write_plain = dumper.write_plain
dumper.write_plain = rewrite_write_plain
dumper._org_write_literal = dumper.write_literal
dumper.write_literal = rewrite_write_literal
dumper._org_write_single_quoted = dumper.write_single_quoted
dumper.write_single_quoted = rewrite_write_single_quoted
dumper._org_write_indicator = dumper.write_indicator
dumper.write_indicator = rewrite_write_indicator
print yaml.dump(data, Dumper=dumper, indent=4)
gives you:
receipt: ###Oz-Ware Purchase Invoice###
date: ###2007-08-06###
customer:
given: ###Dorothy###
items:
- part_no: ###A4786###
descrip: ###Water Bucket (Filled)###
- part_no: ###E1628###
descrip: ###High Heeled "Ruby" Slippers###
size: ###8###
bill-to: ###&id001###
street: |
###123 Tornado Alley
Suite 16###
city: ###East Centerville###
state: ###KS###
ship-to: ###*id001###
specialDelivery: >
###Follow the Yellow Brick Road to the Emerald City.###
which hopefully is acceptable for further processing in C#