Manual parsing is error-prone and hard to make general, and eval
-based approaches fail when the keys are Python keywords. The currently accepted answer breaks if values contain spaces, commas, or colons, and the eval
answer can't handle keys like if
or for
.
Instead, we can tokenize the input as a series of Python tokens and replace NAME
tokens with STRING
tokens, then untokenize to build a valid dict literal. From there, we can just call ast.literal_eval
.
import ast
import io
import tokenize
def parse(x):
tokens = tokenize.generate_tokens(io.StringIO(x).readline)
modified_tokens = (
(tokenize.STRING, repr(token.string)) if token.type == tokenize.NAME else token[:2]
for token in tokens)
fixed_input = tokenize.untokenize(modified_tokens)
return ast.literal_eval(fixed_input)
Then parse("{a:'b', c:'d',e:''}") == {'a':'b', 'c':'d', 'e':''}
, and no problems occur with keywords as keys or special characters in the values:
>>> parse('{a: 2, if: 3}')
{'a': 2, 'if': 3}
>>> parse("{c: ' : , '}")
{'c': ' : , '}