7

My Input is:

input = ['(var1, )', '(var2,var3)']

Expected Output is:

output = [('var1', ), ('var2','var3')]

Iterating over input and using eval/literal_eval on the tuple-strings is not possible:

>>> eval('(var1, )')
>>> NameError: name 'var1' is not defined

How can I convert an item such as '(var1, )' to a tuple where the inner objects are treated as strings instead of variables?

Is there a simpler way than writing a parser or using regex?

Will
  • 24,082
  • 14
  • 97
  • 108
runDOSrun
  • 10,359
  • 7
  • 47
  • 57
  • 1
    http://stackoverflow.com/questions/1810109/parsing-a-string-which-represents-a-list-of-tuples – Maroun Feb 02 '16 at 12:55
  • 2
    @MarounMaroun That doesn't work as I explained. In the question you provided, floats can be evaluated. Strings like 'var1" can't. `literal_eval` will throw a `ValueError: malformed string` – runDOSrun Feb 02 '16 at 12:57

3 Answers3

12

For each occurrence of a variable, eval searches the symbol table for the name of the variable. It's possible to provide a custom mapping that will return the key name for every missing key:

class FakeNamespace(dict):
    def __missing__(self, key):
        return key

Example:

In [38]: eval('(var1,)', FakeNamespace())
Out[38]: ('var1',)

In [39]: eval('(var2, var3)', FakeNamespace())
Out[39]: ('var2', 'var3')

Note: eval copies current globals to the submitted globals dictionary, if it doesn't have __builtins__. That means that the expression will have access to built-in functions, exceptions and constants, as well as variables in your namespace. You can try to solve this by passing FakeNamespace(__builtins__=<None or some other value>) instead of just FakeNamespace(), but it won't make eval 100% safe (Python eval: is it still dangerous if I disable builtins and attribute access?)

vaultah
  • 44,105
  • 12
  • 114
  • 143
  • 1
    Cool approach! Although we should be careful sanitizing input if using `eval()`. – Will Feb 02 '16 at 13:02
  • 1
    Pass fake_globals as both globals and locals to the eval to mitigate attacks - `__builtins__` can't be accessed (nor any other Python name) if both dictionaries are changed: `In [15]: eval("__builtins__['zip']", FakeGlobals() ) Out[15]: zip` , `eval("__builtins__['zip']", FakeGlobals(), FakeGlobals() ) TypeError string indices must be integers` – jsbueno Feb 02 '16 at 13:14
  • (for the above comment: the approach is not absolute, but it might mitigate naive attacks) – jsbueno Feb 02 '16 at 13:20
5

Try this:

tuples = [tuple(filter(None, t.strip('()').strip().split(','))) for t in input]

For example:

In [16]: tuples = [tuple(filter(None, t.strip('()').strip().split(','))) for t in input]

In [17]: tuples
Out[17]: [('var1',), ('var2', 'var3')]

We're iterating through our list of tuple strings, and for each one, removing the ()s, then splitting our string into a list by the ,, and then converting our list back into a tuple. We use filter() to remove empty elements.

Will
  • 24,082
  • 14
  • 97
  • 108
4

I like vaultah's solution. Here's another one with ast.literal_eval and re if eval is not an option:

>>> import re
>>> from ast import literal_eval
>>> [literal_eval(re.sub('(?<=\(|,)(\w+)(?=\)|,)', r'"\1"', x)) for x in input]
[('var1',), ('var2', 'var3')]
timgeb
  • 76,762
  • 20
  • 123
  • 145