0

I'm using pyyaml(Version: 5.1) and Python 2 to parse a YAML data body of an incoming POST API request.

The body of the incoming request contains some Unicode objects, along with some string objects.

The solution given in link is used to load the YAML mapping into an OrderedDict, where the stream refers to the incoming POST API request's YAML data body.

But, I have to use the OrderedDict generated from the link with some library that only accepts string objects.

I can't change the library nor update it and I've to use Python 2.

The current solution for this, which is being used is,

  1. take the OrderedDict generated from the link
  2. recursively parse it, converting any found occurrence of a Unicode object into a String object

The sample code for the same is as below,

def convert(data):
    if isinstance(data, unicode):
        return data.encode('utf-8')
    if isinstance(data, list):
        return [convert(item) for item in data]
    if isinstance(data, dict):
        newData = {}
        for key, value in data.iteritems():
            newData[convert(key)] = convert(value)
        return newData
     return data

Although this works, the solution is not efficient, as the complete OrderedDict is parsed after it is being created.

Is there a way, where the conversion of the data can be done before or during the generation of the OrderedDict, to avoid parsing it again?

1 Answers1

0

You can provide a custom constructor that will always load YAML !!str scalars to Python unicode strings:

import yaml
from yaml.resolver import BaseResolver

def unicode_constructor(self, node):
  # this will always return a unicode string;
  # the default loader would convert it to ASCII-encoded str if possible.
  return self.construct_scalar(node)

yaml.add_constructor(BaseResolver.DEFAULT_SCALAR_TAG, unicode_constructor)

Afterwards, yaml.load will always return unicode strings.

(Code untested as I don't have a Python 2 installation)

flyx
  • 35,506
  • 7
  • 89
  • 126
  • The suggested code works fine, but I could not find good documentation about the usage of add_constructor. Can you please point towards that too? – Puneet Ugru Sep 30 '20 at 14:35
  • The [official PyYAML documentation](https://pyyaml.org/wiki/PyYAMLDocumentation) describes its general usage but does not go into much detail. To be frank, most things I know about how to use PyYAML come from reading its source code and knowing how the [YAML specification](https://yaml.org/spec/1.2/spec.html) describes the loading process of a file (covered in Chapter 3). – flyx Sep 30 '20 at 15:35