2

Ok,

If I have a string, say x = 'Hello World!', how can I parse python strings from that? I know that I can use regex like "[^"]*", but how I can parse all valid python strings? Solution doesn't have to be regex, but if that is possible, it's great.

For example:

  • x = 'Hello World!' => Hello World!
  • x = '\'Stack Overflow\'' => \'Stack Overflow\'
  • x = 'x=\"x=\'Python\n\'\"' => x=\"x=\'Python\n\'\"

Sorry, if I cannot explain it clearly, but this is not easy as non-native speaker.

Hannes Karppila
  • 969
  • 2
  • 13
  • 31

1 Answers1

5

well, the simplest way would be to use ast.literal_eval():

>>> literal_eval(r"'Hello World!'")
'Hello World!'
>>> literal_eval(r"'\'Stack Overflow\''")
"'Stack Overflow'"
>>> literal_eval(r"""'x=\"x=\'Python\n\'\"'""")
'x="x=\'Python\n\'"'

but if you want to extract python strings from a string containing one ore several full python statements, you can do:

def get_string(s):
    for it in ast.walk(ast.parse(s)):
        if isinstance(it, ast.Str):
            yield it.s

here is the results:

>>> for i in get_string(r"'Hello World!'"): print i
... 
Hello World!

for the following match, to get the result you're expecting, you need to have your string setup as a raw string:

>>> for i in get_string("x = '\'Stack Overflow\''"): print i
... 
'Stack Overflow'
>>> for i in get_string(r"x = '\'Stack Overflow\''"): print i
... 
\'Stack Overflow\'

for the last match, to get the result you're expecting you need to have the inner string setup as a raw string:

>>> for i in get_string(r"""x = 'x=\"x=\'Python\n\'\"'"""): print i
... 
x="x='Python
'"
>>> for i in get_string(r"""x = r'x=\"x=\'Python\n\'\"'"""): print i
... 
x=\"x=\'Python\n\'\"

In the end, even though a non-regular regex can do the job, it'll always be a better option to actually use the parser that is used to parse python to parse python strings, because you'll be using the same tool being used to create and parse python strings!

Community
  • 1
  • 1
zmo
  • 24,463
  • 4
  • 54
  • 90