2

I'm working on a project to pull user information out of a MySQL database and format it into a yaml file that Ansible can read and use as a vars file. I need all the normal user info, username, email, etc, along with their public ssh key from the database.

Problem is, PyYAML is inserting an extra line break before the email part of the pubkey, and I cannot figure out why. Here is a simple example:

import yaml

yamldict = { "users": [] }

yamldict["users"].append({
    "username": "user",
    "name": "user",
    "sshkey": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHV/xbvOHuPq6WbBhtmjUWKYPrqQlkILf8b/I6V9dZVBPzmhRZFCAf/gWny0hmZ95bVRED4iCSTCtN3Lq2VZiZ/kwBO7Y9E4vr1wVQYrr4IIwEhdaifZmWFLlwOXbt76dxJQs2xS9Z5ZQjEzZBFZqgYu42QbSi7tKBNSaLadOWbB3sq0IOzCZeSgrELlZIuUy7u1RbcS4w2Y29S3XLrbi2yVdVbPW8B9PfsG1n4q2/XR7w3gqhP6c8ibO4jYpADLZuHZvuoVpjKINO4kSdrwUfD8rl3MBIAD/Nu9sy0bIiKdSONQohxcsjMevxPOijjz4EiI1Ad4U6dDJrFlT0asYH user@email.com"
})

which outputs:

users:
- name: user
  sshkey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHV/xbvOHuPq6WbBhtmjUWKYPrqQlkILf8b/I6V9dZVBPzmhRZFCAf/gWny0hmZ95bVRED4iCSTCtN3Lq2VZiZ/kwBO7Y9E4vr1wVQYrr4IIwEhdaifZmWFLlwOXbt76dxJQs2xS9Z5ZQjEzZBFZqgYu42QbSi7tKBNSaLadOWbB3sq0IOzCZeSgrELlZIuUy7u1RbcS4w2Y29S3XLrbi2yVdVbPW8B9PfsG1n4q2/XR7w3gqhP6c8ibO4jYpADLZuHZvuoVpjKINO4kSdrwUfD8rl3MBIAD/Nu9sy0bIiKdSONQohxcsjMevxPOijjz4EiI1Ad4U6dDJrFlT0asYH
    user@email.com
  username: user

I've tried many different ways to strip out extra whitespace, newlines and carriage returns. I've also tried converting this dict to json, and the ssh key looks good there, and then running yaml.dump on the json and it still gives me that extra newline.

Any ideas what I'm doing wrong here?

  • Does it really matter? Normally, the part after the key is purely descriptive and not meaningful for the algorithm itself. – glglgl Apr 20 '18 at 21:30
  • im not sure, does it? This will eventually be used to create ssh accounts, I'm not sure if the extra line break is going to cause a problem for ansible's authorized_keys module....guess I should test that and maybe stop obsessing with this.... – jasondewitt Apr 20 '18 at 22:00

2 Answers2

2

YAML can represent a string as a scalar in multiple ways: plain (without quotes), single quoted, double quoted, with literal or folded style. Your value for the key sshkey is a plain scalar.

YAML also wants to be readable, and long lines are not very readable. So there are rules how to wrap long lines forced by wide scalars. Your plain scalar that is the value for sshkey is wrapped. That means there is a newline in the YAML document, but there is no newline in the scalar string it represents, and on reading the YAML document, that newline gets "unfolded".

You can see this by running the following with your yamldict definition:

with open('tmp.yaml', 'w') as fp:
    yaml.safe_dump(yamldict, fp)
with open('tmp.yaml') as fp:
    data = yaml.safe_load(fp)

assert '\n' in data['users'][0]['sshkey']

this will throw an error, as there is no newline in the re-loaded ssh-key.

So your program is fine, but the thing you have been doing wrong is that you did not read the YAML specification, in particular the part on line folding.


Now this particular folding doesn't really make things more readable as there are not enough spaces in the ssh-key. So you might as well increase the line width and get everything on one line. You can do that with PyYAML , but I recommend you use ruamel.yaml for that, which support the newer YAML 1.2 standard, allows seperate indent values for mappings and sequences and has many PyYAML issues fixed (disclaimer: I am the author of that package):

import sys
from ruamel.yaml import YAML

yaml = YAML()
yaml.width = 1024
# yaml.indent(sequence=4, offset=2)  # uncomment to indent the sequences "-"

yamldict = { "users": [] }

yamldict["users"].append({
    "username": "user",
    "name": "user",
    "sshkey": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHV/xbvOHuPq6WbBhtmjUWKYPrqQlkILf8b/I6V9dZVBPzmhRZFCAf/gWny0hmZ95bVRED4iCSTCtN3Lq2VZiZ/kwBO7Y9E4vr1wVQYrr4IIwEhdaifZmWFLlwOXbt76dxJQs2xS9Z5ZQjEzZBFZqgYu42QbSi7tKBNSaLadOWbB3sq0IOzCZeSgrELlZIuUy7u1RbcS4w2Y29S3XLrbi2yVdVbPW8B9PfsG1n4q2/XR7w3gqhP6c8ibO4jYpADLZuHZvuoVpjKINO4kSdrwUfD8rl3MBIAD/Nu9sy0bIiKdSONQohxcsjMevxPOijjz4EiI1Ad4U6dDJrFlT0asYH user@email.com"
})


yaml.dump(yamldict, sys.stdout)

this dumps as:

users:
- username: user
  name: user
  sshkey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHV/xbvOHuPq6WbBhtmjUWKYPrqQlkILf8b/I6V9dZVBPzmhRZFCAf/gWny0hmZ95bVRED4iCSTCtN3Lq2VZiZ/kwBO7Y9E4vr1wVQYrr4IIwEhdaifZmWFLlwOXbt76dxJQs2xS9Z5ZQjEzZBFZqgYu42QbSi7tKBNSaLadOWbB3sq0IOzCZeSgrELlZIuUy7u1RbcS4w2Y29S3XLrbi2yVdVbPW8B9PfsG1n4q2/XR7w3gqhP6c8ibO4jYpADLZuHZvuoVpjKINO4kSdrwUfD8rl3MBIAD/Nu9sy0bIiKdSONQohxcsjMevxPOijjz4EiI1Ad4U6dDJrFlT0asYH user@email.com

The other thing you can do is dump that key as a literal style scalar. For that you need to include an import: from ruamel.yaml.scalarstring import PreservedScalarString and then somewhere define the key as preserved scalar string after reading in the data from MySQL. In your example you could e.g. do:

for m in yamldict['users']:
    m['sshkey'] = PreservedScalarString(m['sshkey'])

assuming you remove the yaml.width = 1024, and include yaml.indent(sequence=4, offset=2) this will then dump as:

users:
  - username: user
    name: user
    sshkey: |-
      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHV/xbvOHuPq6WbBhtmjUWKYPrqQlkILf8b/I6V9dZVBPzmhRZFCAf/gWny0hmZ95bVRED4iCSTCtN3Lq2VZiZ/kwBO7Y9E4vr1wVQYrr4IIwEhdaifZmWFLlwOXbt76dxJQs2xS9Z5ZQjEzZBFZqgYu42QbSi7tKBNSaLadOWbB3sq0IOzCZeSgrELlZIuUy7u1RbcS4w2Y29S3XLrbi2yVdVbPW8B9PfsG1n4q2/XR7w3gqhP6c8ibO4jYpADLZuHZvuoVpjKINO4kSdrwUfD8rl3MBIAD/Nu9sy0bIiKdSONQohxcsjMevxPOijjz4EiI1Ad4U6dDJrFlT0asYH user@email.com

Where |- indicates a literal style block scalar.


If you need to stick with PyYAML, then use safe_dump(yamldict, ..., width=1024), however there is no easy way there to dump the key as literal style block scalar, nor to indent only the sequences).

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Thanks for the comprehensive answer, this is great. Unfortunately I need to stick with PyYAML for now because I'm running on an old Centos6 server and dont want to play around installing new stuff. But I am definitely going to start using your library in other places where I do have more control. – jasondewitt Apr 22 '18 at 21:23
  • @jasondewitt Often the choice of library is limited by the circumstances, that's why I included how to do some of this in PyYAML. I always use virtualenvs (with the system Python or a newer version) for utilities/programs I make, in order not to mess-up the system Python installation. – Anthon Apr 24 '18 at 05:11
  • BTW It is interesting to have an accepted answer with -1 score, I must have stepped on someone's toes recently... – Anthon Apr 24 '18 at 05:12
-1

This is my solution, using PyYAML:

import yaml

def add_line_breaks(long_string, line_len=70):
    return '\n'.join(long_string[i:i+line_len] for i in range(0, len(long_string), line_len))

def long_str_representer(dumper, data): # https://stackoverflow.com/a/33300001/10590519
    if len(data.splitlines()) > 1:  # check for multiline string
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, long_str_representer)

yamldict = { "users": [] }

yamldict["users"].append({
    "username": "user",
    "name": "user",
    "sshkey": add_line_breaks("ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHV/xbvOHuPq6WbBhtmjUWKYPrqQlkILf8b/I6V9dZVBPzmhRZFCAf/gWny0hmZ95bVRED4iCSTCtN3Lq2VZiZ/kwBO7Y9E4vr1wVQYrr4IIwEhdaifZmWFLlwOXbt76dxJQs2xS9Z5ZQjEzZBFZqgYu42QbSi7tKBNSaLadOWbB3sq0IOzCZeSgrELlZIuUy7u1RbcS4w2Y29S3XLrbi2yVdVbPW8B9PfsG1n4q2/XR7w3gqhP6c8ibO4jYpADLZuHZvuoVpjKINO4kSdrwUfD8rl3MBIAD/Nu9sy0bIiKdSONQohxcsjMevxPOijjz4EiI1Ad4U6dDJrFlT0asYH user@email.com")
})

print(yaml.dump(yamldict, default_flow_style=False))

This will output:

users:
- name: user
  sshkey: |-
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHV/xbvOHuPq6WbBhtmjUWKYPrqQlkIL
    f8b/I6V9dZVBPzmhRZFCAf/gWny0hmZ95bVRED4iCSTCtN3Lq2VZiZ/kwBO7Y9E4vr1wVQ
    Yrr4IIwEhdaifZmWFLlwOXbt76dxJQs2xS9Z5ZQjEzZBFZqgYu42QbSi7tKBNSaLadOWbB
    3sq0IOzCZeSgrELlZIuUy7u1RbcS4w2Y29S3XLrbi2yVdVbPW8B9PfsG1n4q2/XR7w3gqh
    P6c8ibO4jYpADLZuHZvuoVpjKINO4kSdrwUfD8rl3MBIAD/Nu9sy0bIiKdSONQohxcsjMe
    vxPOijjz4EiI1Ad4U6dDJrFlT0asYH user@email.com
  username: user
bitinerant
  • 1,168
  • 7
  • 24