2

Let's say I have an object already defined in my Python script that serves as a container for some random items. Each attribute of the container corresponds to an item. In this simple example, I have an ITEMS object that has a BALL attribute which points to a Ball instance.

Now, I need to load some content in YAML, but I want that content to be able to reference the existing ITEMS variable that is already defined. Is this possible? Maybe something along the lines of...

ITEMS = Items()
setattr(Items, 'BALL', Ball())

yaml_text = "item1: !!python/object:ITEMS.BALL"
yaml_items = yaml.load(yaml_text)

My goal, after loading the YAML, is for yaml_items['item1'] to be the Ball instance from the ITEMS object.

martineau
  • 119,623
  • 25
  • 170
  • 301
dlang
  • 45
  • 2
  • 8
  • What third-party YAML library/module are you using? It's unlikely you'll be able to easily do this unless it's a feature of whatever that is. – martineau Nov 28 '17 at 21:17
  • @martineau I've been trying out both PyYAML and ruamel.yaml with simple things, but the module we go with is still TBD. – dlang Nov 28 '17 at 21:23
  • I was just looking at the [description](https://pypi.python.org/pypi/PyYAML/3.12) of the PyYAML module and it says "PyYAML features a complete YAML 1.1 parser, Unicode support, pickle support, capable extension API, and sensible error messages. PyYAML supports standard YAML tags and **provides Python-specific tags that allow to represent an arbitrary Python object**." (emphasis mine), so it just sound like you just need to figure out how to do that last part—is there any documentation and have you read it? – martineau Nov 28 '17 at 21:45
  • @martineau I've looked through docs for [PyYAML](http://pyyaml.org/wiki/PyYAMLDocumentation) and for [ruamel.yaml](http://yaml.readthedocs.io/en/latest/overview.html). There are ways to use python-specific tags, like you mentioned, but I haven't seen anything for using an existing object like I initially proposed. – dlang Nov 29 '17 at 13:41

2 Answers2

0

@martineau quoted the documentation:

[…] provides Python-specific tags that allow to represent an arbitrary Python object.

represent, not construct. It means that you can dump any Python object to YAML, but you can not reference an existing Python object inside YAML.

That being said, you can of course add your own constructor to do it:

import yaml

def eval_constructor(loader, node):
  return eval(loader.construct_scalar(node))

yaml.add_constructor(u'!eval', eval_constructor)

some_value = '123'

yaml_text = "item1: !eval some_value"
yaml_items = yaml.load(yaml_text)

Be aware of the security implications of evaling configuration data. Arbitrary Python code can be executed by writing it into the YAML file!

Mostly copied from this answer

martineau
  • 119,623
  • 25
  • 170
  • 301
flyx
  • 35,506
  • 7
  • 89
  • 126
  • Defining my own constructor is a strategy I've considered, but I was hoping to avoid this so that I wouldn't have to do any of my own text parsing or `eval`ing. That being said, I appreciate the suggestion. Thanks! – dlang Nov 29 '17 at 13:49
0

Here's a way of doing it the uses the di() function defined in the answer to another question. It takes the integer value returned from the built-in id() function and converts it to a string. The yaml.load() function will call a custom constructor which then does the reverse of that process to determine the object returned.

Caveat: This takes advantage of the fact that, with CPython at least, the id() function returns the address of the Python object in memory—so it may not work with other implementations of the interpreter.

import _ctypes
import yaml

def di(obj_id):
    """ Reverse of id() function. """
    return _ctypes.PyObj_FromPtr(obj_id)

def py_object_constructor(loader, node):
    return di(int(node.value))

yaml.add_constructor(u'!py_object', py_object_constructor)

class Items(object): pass

def Ball(): return 42

ITEMS = Items()
setattr(Items, 'BALL', Ball())  # Set attribute to result of calling Ball().

yaml_text = "item1: !py_object " + str(id(ITEMS.BALL))
yaml_items = yaml.load(yaml_text)

print(yaml_items['item1'])  # -> 42

If you're OK with using eval(), you could formalize this and make it easier to use by monkey-patching the yaml module's load() function to do some preprocessing of the yaml stream:

import _ctypes
import re
import yaml

#### Monkey-patch yaml module.
def _my_load(yaml_text, *args, **kwargs):
    REGEX = r'@@(.+)@@'

    match = re.search(REGEX, yaml_text)
    if match:
        obj = eval(match.group(1))
        yaml_text = re.sub(REGEX, str(id(obj)), yaml_text)

    return _yaml_load(yaml_text, *args, **kwargs)

_yaml_load = yaml.load  # Save original function.
yaml.load = _my_load  # Change it to custom version.
#### End monkey-patch yaml module.

def di(obj_id):
    """ Reverse of id() function. """
    return _ctypes.PyObj_FromPtr(obj_id)

def py_object_constructor(loader, node):
    return di(int(node.value))

yaml.add_constructor(u'!py_object', py_object_constructor)

class Items(object): pass

def Ball(): return 42

ITEMS = Items()
setattr(Items, 'BALL', Ball())  # Set attribute to result of calling Ball().

yaml_text = "item1: !py_object @@ITEMS.BALL@@"
yaml_items = yaml.load(yaml_text)
print(yaml_items['item1'])  # -> 42
martineau
  • 119,623
  • 25
  • 170
  • 301
  • This is a very interesting and clever implementation, and I may end up giving this a try. Thanks! – dlang Nov 30 '17 at 17:07
  • Also note that this doesn't use `eva()l`, so is more secure than what's in @flyx's answer. Doing something like this for JSON is better and easier to implement because the `json` module has a hook that allows something like the `+ str(id(ITEMS.BALL))` part to be done more automatically. I'm not that familiar with the `yaml` module—installed it just to test my answer—so there may be a better way to implement it than what is shown here. – martineau Nov 30 '17 at 17:32