Elegant way to parse strings which contain lists generated with join() in Python

Question

I'm trying to find a general way of generating objects which can be converted to strings and back again using the parse module. For example, for a class StringyObject whose instances have just two attributes a and b:

import parse

class StringyObject(object):
    fmt = "{a} {b}"

    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __str__(self):
        return self.fmt.format(a=self.a, b=self.b)

    @classmethod
    def parse(cls, string):
        result = parse.parse(cls.fmt, string)
        kwargs = result.named
        return cls(**kwargs)

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return NotImplemented

if __name__ == "__main__":
    obj = StringyObject("foo", "bar")
    reconstructed_obj = StringyObject.parse(str(obj))
    assert reconstructed_obj == obj, "The reconstructed object should be equivalent to the original one."

The script consecutively calls the __str__ instance method and the parse class method, and verifies that the resulting objects obj and reconstructed_obj are equivalent (defined here as being instances of the same class and having the same dictionaries; cf. Elegant ways to support equivalence ("equality") in Python classes).

So far, so good, but I'd like to extend this method to attributes which are lists of variable length. For example, if b is a list, then I could do the following:

import parse

class StringyObject(object):
    fmt = "{a} {b}"
    separator = ", "

    def __init__(self, a, b):
        self.a = a
        assert isinstance(b, list), "b should be a list."
        self.b = b

    def __str__(self):
        b_string = self.separator.join(self.b)
        return self.fmt.format(a=self.a, b=b_string)

    @classmethod
    def parse(cls, string):
        result = parse.parse(cls.fmt, string)
        kwargs = result.named
        kwargs['b'] = kwargs['b'].split(cls.separator)
        return cls(**kwargs)

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return NotImplemented

if __name__ == "__main__":
    obj = StringyObject("foo", ["bar1", "bar2"])
    reconstructed_obj = StringyObject.parse(str(obj))
    assert reconstructed_obj == obj, "The reconstructed object should be equivalent to the original object."

This still works for this example, but is less elegant because I start to have to use join() and split(), which is what I wanted to avoid by using parse.parse. Furthermore, if I add another attribute c which comes after b in the string representation, the parsing goes haywire:

class StringyObject(object):
    fmt = "{a} {b} {c}"
    separator = ", "

    def __init__(self, a, b, c):
        self.a = a
        assert isinstance(b, list), "b should be a list."
        self.b = b
        self.c = c

    def __str__(self):
        b_string = self.separator.join(self.b)
        return self.fmt.format(a=self.a, b=b_string, c=self.c)

Then running the script

obj = StringyObject("foo", ["bar1", "bar2"], "hello")
result = parse.parse(StringyObject.fmt, str(obj))

produces the wrong Result object:

<Result () {'a': 'foo', 'c': 'bar2 hello', 'b': 'bar1,'}>

What I would actually like to is implement a kind of 'sub-parser' for b which keeps on running as long as it can find a separator, and only then continues with parsing c. Is there an elegant way to do this?

*contain lists generated with join() in Python* It is WRONG. `' '.join()` do not generate a list but a string from list — Moinuddin Quadri, Dec 19 '16 at 17:43
[Moinuddin Quadri](http://stackoverflow.com/users/2063361/moinuddin-quadri), a string cannot 'contain' a list in the Pythonic sense of the word, so what I mean in this context is a 'listing' or 'enumeration' such as "a, b, c" within a string. — Kurt Peek, Dec 20 '16 at 09:42

score 1 · Answer 1 · answered Dec 19 '16 at 19:24

My suggestion is to look into using ast.literal_eval. This function is a safe eval of Python literal structures (ints, float, strings, lists, dicts...)

I wasn't able to get your examples to work using the parse library, but if you modify your format string slightly, it will work pretty easily with ast.literal_eval:

import ast

class StringyObject(object):
    fmt = "{a!r}, {b!r}"

    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __str__(self):
        return self.fmt.format(a=self.a, b=self.b)

    @classmethod
    def parse(cls, string):
        return cls(*ast.literal_eval(string))

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return NotImplemented

if __name__ == "__main__":
    objects = [("foo", "bar"), 
        ("foo", ["bar1", "bar2"]), 
        (["foo1", ("foo2", ["foo3", {"foo4"}])], {"bar1" : "bar2", "bar3": ["bar4", "bar5"]})]
    for a, b in objects:
        obj = StringyObject(a, b)
        reconstructed_obj = StringyObject.parse(str(obj))
        assert reconstructed_obj == obj, "The reconstructed object should be equivalent to the original one."

The downside to this implementation is that it will only work for basic python literals; i.e., StringyObject(frozenset(['foo']), 'bar') won't work.

score 0 · Accepted Answer · answered Dec 20 '16 at 09:53

I found that the desired parsing result could be achieved by adding some 'fixed' characters (not just spaces) in the format string. For example, below I've put a pipe (|) between the {b} and {c}:

import parse

class StringyObject(object):
    fmt = "{a} {b} | {c}"
    separator = ", "

    def __init__(self, a, b, c):
        self.a = a
        assert isinstance(b, list), "b should be a list."
        self.b = b
        self.c = c

    def __str__(self):
        b_string = self.separator.join(self.b)
        return self.fmt.format(a=self.a, b=b_string, c=self.c)

    @classmethod
    def parse(cls, string):
        result = parse.parse(cls.fmt, string)
        kwargs = result.named
        kwargs['b'] = kwargs['b'].split(cls.separator)
        return cls(**kwargs)

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return NotImplemented

if __name__ == "__main__":
    obj = StringyObject("foo", ["bar1", "bar2"], "hello")
    result = parse.parse(StringyObject.fmt, str(obj))
    print result

    reconstructed_obj = StringyObject.parse(str(obj))
    assert reconstructed_obj == obj, "The reconstructed object should be equivalent to the original object."

The printed Result is

<Result () {'a': 'foo', 'c': 'hello', 'b': 'bar1, bar2'}>

as desired. The reconstructed_obj is also equivalent to the original obj.

Elegant way to parse strings which contain lists generated with join() in Python

2 Answers2