-1

Note that I cannot modify s on creation and I am ideally looking for a method via ast

The following expression

import ast
s = 'func(arg="\\\\my\\network\\drive")'
ast.parse(s).body[0].value.keywords[0].value.s

will return

'\\my\network\\drive'

Is there anyway to get around this without manually modifying s as follows

ast.parse(s.replace('\\', '\\\\')).body[0].value.keywords[0].value.s

The expected output is:

"\\\\my\\network\\drive"
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100
  • A decent workaround is `s.encode("unicode_escape")` as it appears that this is actually a replication of https://stackoverflow.com/q/18707338/4013571 since there is not built-in method in `ast`. Flagging as a duplicate as this will help less experienced coders using `ast` – Alexander McFarlane Dec 03 '18 at 19:17
  • `s.encode("unicode_escape")` isn't going to help you with things like newlines or escaped quotation marks. If your input string spans two lines, `unicode-escape` will convert that to backslash-n. If your input has escaped quotation marks, `unicode-escape` will have no idea whether any of those should be escaped, and it won't escape any of them. – user2357112 Dec 03 '18 at 19:25
  • would a better approach be `.replace()` then? In my particular instance the input string `s` will always be a single function expression – Alexander McFarlane Dec 03 '18 at 20:08

1 Answers1

0

The ast module is, in fact, parsing the input correctly. You've misunderstood something about the string representations involved. Depending on what it turns out you actually need, the solution might be to use a raw string literal (note the r):

s = r'func(arg="\\\\my\\network\\drive")'

As things stand, the string literal you've written represents the Python source code

func(arg="\\my\network\drive")

and not the source code

func(arg="\\\\my\\network\\drive")

and ast is processing the \n escape in the way the Python syntax says to process it.


If you're hoping for some way to undo string escape processing on an already-processed string, this is impossible. String escape processing is not an injective mapping from inputs to outputs. There is no way to recover the unprocessed form; you need to avoid string escape processing in the first place.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I an not getting a raw string type on input so I don't think this can be an answer – Alexander McFarlane Dec 03 '18 at 18:55
  • @AlexanderMcFarlane: How *are* you taking input, then? Most ways of taking input automatically behave like a raw string literal already. You've done *something* wrong to get to this point. – user2357112 Dec 03 '18 at 18:56
  • I do see your point - *"You've done something wrong to get to this point."* - I am taking an input from a script created by someone else that will contain `s` as valid python code. This cannot be modified. It is not a valid answer to go back to the source and say *"your valid python code isn't working because of the way I parse it"*. That is part of the challenge here – Alexander McFarlane Dec 03 '18 at 18:57
  • @AlexanderMcFarlane: The input you've given *isn't* correct Python source code. It's using backslash escaping incorrectly. Trying to fix this in the parsing phase is a bad idea. – user2357112 Dec 03 '18 at 18:59
  • I would argue that `python -c 'func(arg="\\\\my\\network\\drive")'` is perfectly valid and that `ast` should be able to parse – Alexander McFarlane Dec 03 '18 at 19:00
  • @AlexanderMcFarlane: Shell single-quotes behave like a raw string literal. You wouldn't be in this position if your input had come from a single-quoted command line argument. – user2357112 Dec 03 '18 at 19:02
  • Valid point but similarly, `func(arg="\\\\my\\network\\drive")` will also work within a python file when executed. It will also work in the python interpreter. Nevertheless, it stands that your answer cannot be a correct answer if it results in changing the source. Is the true answer that there is no method in `ast` and that a redirection to *"patching up the original string"* answer is the only workaround? – Alexander McFarlane Dec 03 '18 at 19:05
  • @AlexanderMcFarlane: There is no workaround. `"func(arg="\\\\my\\network\\drive")"` does not represent `func(arg="\\\\my\\network\\drive")`. String escape processing is not reversible. – user2357112 Dec 03 '18 at 19:11
  • *does not represent* - totally agreed. Although that does not solve the problem. I think a correct answer is probably a combination of your warnings and this https://stackoverflow.com/a/26867674/4013571 – Alexander McFarlane Dec 03 '18 at 19:15
  • @AlexanderMcFarlane: Be aware that attempts to undo string escape processing are always going to be unreliable. Nothing can distinguish `'"a \" \" b"'` and `'"a " " b"'`. – user2357112 Dec 03 '18 at 19:19
  • yeah thanks for the input - I was really hoping that there was something in `ast` to handle. Although I did learn from our discussion that it is impossible to precisely revert string encoding and that python 3 is again superior here! – Alexander McFarlane Dec 03 '18 at 19:21