0

When parsing /proc/self/mountinfo on Linux some fields of each line describing each a mount may very well contain utf-8 encoded characters. Since the line format of mountinfo separates fields by spaces, mountinfo escapes at least (space) and \ (backslash) as "\040" and "\134" (literally!). How can I convert a field value ("/tmp/a\ ", Python string '/tmp/a\\134\\040') back into a non-escaped string?

Is there a better way than the following rather involved one (from https://stackoverflow.com/a/26311382)? That is, with less encoding/decoding chaining?

>>> s='/tmp/a\\134\\040'
>>> s.encode().decode('unicode-escape').encode('latin-1').decode('utf-8')
'/tmp/a\\ '

PS: Don't ask why anyone sane would use such path names; this is just for illustrational purposes ;)

TheDiveO
  • 2,183
  • 2
  • 19
  • 38
  • 1
    I don't think you can do better than that. I say be happy you don't have to use regex! – lenz Jun 18 '20 at 17:47
  • If it helps, you can omit the last argument `'utf8'` to the final `.decode()`, like you did for the first `.encode()`. – lenz Jun 18 '20 at 17:48

0 Answers0