1

I'm writing a Python list to a YAML file. My list looks like:

my_list = ["a", "b", "c"]

If I do:

with open("my_yaml.yaml", "w") as f:
    f.write(yaml.dump(my_list, explicit_start=True, default_flow_style=False))

Then my YAML file looks like this:

---
- a
- b
- c

But I'd like each of a, b, and c to show up as strings. So I'd like:

---
- "a"
- "b"
- "c"

If I do this:

new_list = ['\'' + entry + '\'' for entry in my_list]
with open("my_yaml.yaml", "w") as f:
        f.write(yaml.dump(new_list, explicit_start=True, default_flow_style=False))

Then it shows up as:

---
- '''a'''
- '''b'''
- '''c'''

How can I get the desired format?

Anthon
  • 69,918
  • 32
  • 186
  • 246
anon_swe
  • 8,791
  • 24
  • 85
  • 145
  • [See this](https://stackoverflow.com/questions/38369833/pyyaml-and-using-quotes-for-strings-only) – swiftg Jul 18 '18 at 23:16
  • 2
    Possible duplicate of [pyyaml and using quotes for strings only](https://stackoverflow.com/questions/38369833/pyyaml-and-using-quotes-for-strings-only) – David Maze Jul 18 '18 at 23:33
  • ...the second answer there in fact answers this question. Note that your first two YAML files are semantically identical: both represent lists with the three strings "a", "b", and "c". – David Maze Jul 18 '18 at 23:34

1 Answers1

2

Quotes at the beginning of a scalar have a special meaning in YAML, they make the difference between plain scalars with resp. double and single quoted scalars. In order to dump strings that have the quotes "naturally", YAML has to go through extra hoops: inserting a backslash (\) in case of double quotes, or an extra single quotes ' in case of single quoted strings. The latter is what causes your output.

If you don't need any more fine grained control, you can do:

import yaml

my_list = ["a", "b", "c"]
with open("my_yaml.yaml", "w") as f:
    yaml.safe_dump(my_list, f, default_style='"', enable_unicode=True,
                   explicit_start=True, default_flow_style=False)

which gives:

---
- "a"
- "b"
- "c"

Please note that:

  • you should always use safe_dump() (unless you can't), that way you know up-front that loading with safe_load() (you are loading not using the unsafe load() are you?) is going to work.
  • PyYAML has a streaming interface: dump normally takes a second parameter, which if not specified makes it stream into a in-memory buffer and returning the content of that buffer, which you then stream to file. Don't do that it is memory-inefficient and slow.

The default_style='"' will make any scalar into a string. Any scalar? Yes, any scalar, including keys for mappings (which actually might be what you want) and integers and all the other special types YAML supports:

Changing my_list to:

my_list = ["a", "b", "c", dict(d=42), 3.14, True]

will give you:

---
- "a"
- "b"
- "c"
- "d": !!int "42"
- !!float "3.14"
- !!bool "true"

If you don't want that and only want (specific) sequence elements to be quoted, you can use ruamel.yaml (disclaimer: I am the author of that package), and do something like:

import ruamel.yaml 
from ruamel.yaml.scalarstring import (DoubleQuotedScalarString as dq, 
                                      SingleQuotedScalarString as sq)

yaml = ruamel.yaml.YAML()
my_list = [dq("a"), sq("b"), dq("c"), dict(d=42), 3.14, True]

with open("my_yaml.yaml", "w") as f:
    yaml.dump(my_list, f)

to get:

- "a"
- 'b'
- "c"
- d: 42
- 3.14
- true
Anthon
  • 69,918
  • 32
  • 186
  • 246