5

Please note, that the question is similar like this one, but still different so that those answers won't solve my problem:

  • For insertion of control characters like e.g. \x08, it seems that I have to use double quotes ".
  • All spaces needs to be preserved exactly as given. For line breaks I use explicitly \n.

I have some string data which I need to store in YAML, e.g.:

  • " This is my quite long string data "
  • "This is my quite long string data"
  • "This_is_my_quite_long_string_data"
  • "Sting data\nwhich\x08contains control characters"

and need it in YAML as something like this:

Key: "  This  is  my" +
     "  quite  long " +
     " string  data  "

This is no problem as long as I stay on a single line, but I don't know how to put the string content to multiple lines.

YAML block scalar styles (>, |) won't help here, because they don't allow escaping and they even do some whitespace stripping, newline / space substitution which is useless for my case.

Looks that the only way seems to be using double quoting " and backslashes \, like this:

Key: "\
  This is \
  my quite \
  long string data\
  "

Trying this in YAML online parser results in "This is my quite long string data" as expected.

But it unfortunately fail if one of the "sub-lines" has leading space, like this:

Key: "\
  This is \
  my quite\
   long st\
  ring data\
  "

This results in "This is my quitelong string data", removed the space between the words quite and long of this example. The only thing that comes to my mind to solve that, is to replace the first leading space of each sub-line by \x20 like this:

Key: "\
  This is \
  my quite\
  \x20long st\
  ring data\
  "

As I'd chosen YAML to have a best possible human readable format, I find that \x20 a bit ugly solution. Maybe someone know a better approach?

For keeping human readable, I also don't want to use !!binary for this.

Joe
  • 3,090
  • 6
  • 37
  • 55
  • 1
    Could you come up with a title and first paragraph that makes clearer why this is not a duplicate of the existing question you linked to? In particular, what is special about your situation that none of the techniques in [the most up-voted answer](https://stackoverflow.com/a/21699210/157957) apply? I think possibly your actual question is "How do I preserve leading spaces in a multi-line string?" or something similar. – IMSoP Sep 05 '17 at 17:40
  • Done. Moved reference to the other question to first paragraph and also inserted "preserve spaces" to the title. – Joe Sep 06 '17 at 06:34
  • Possible duplicate of [How do I break a string over multiple lines?](https://stackoverflow.com/questions/3790454/how-do-i-break-a-string-over-multiple-lines) – codeforester Jul 12 '19 at 01:02

2 Answers2

2

Instead of \x20, you can simply escape the first non-indentation space on the line:

Key: "\
  This is \
  my quite\
  \ long st\
  ring data\
  "

This works with multiple spaces, you only need to escape the first one.

flyx
  • 35,506
  • 7
  • 89
  • 126
1

You are right in your observation that control characters can only be represented in double quoted scalars.

However the parser doesn't fail if the sub-lines (in YAML speak: continuation lines) have a leading space. It is your interpretation of the YAML standard that is incorrect. The standard explicitly states that for multi-line double quoted scalars:

All leading and trailing white space characters are excluded from the content.

So you can put as many spaces as you want before long as you want, it will not make a difference.

The representer for double quoted scalars for Python (both in ruamel.yaml and PyYAML) always does represent newlines as \n. I am not aware of YAML representers in other languages where you have more control over this (and e.g. get double newlines to represent \n in your double quoted scalars). So you probably have to write your own representer.

While writing a representer you can try to make the line breaking be smart, in that it minimizes the number of escaped spaces (by putting them between words on the same line). But especially on strings with a high double space to word ratio, combined with a small width to operate in, it will be hard (if not impossible) to do without escaped spaces.

Such a representer should IMO first check if double quoting is necessary (i.e. there are control characters apart from newlines). If not, and there are newlines you are probably better of representing the string a block style literal scalar (for which spaces at the beginning or end of line are not excluded).

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • 1
    I think you've overinterpreted the word "fails". It's clear from the next sentence in the question that it just means "fails to give the desired result" - specifically, it fails to preserve the space between "quite" and "long" in the example. There's no mention in the question of the YAML spec, only of trying some sample inputs and not getting the desired result, so the fact that the spec confirms what the OP already knows about leading spaces is pretty irrelevant. – IMSoP Sep 06 '17 at 08:49