5

How can I effectively split multiline string containing backslashes resulting in unwanted escape characters into separate lines?

Here is an example input I'm dealing with:

strInput = '''signalArr(0)="ASCB D\axx\bxx\fxx\nxx"
signalArr(1)="root\rxx\txx\vxx"'''

I've tried this (to transform single backslash into double. So backslash escape would have precedence and following character would be treated "normally"):

def doubleBackslash(inputString):
    inputString.replace('\\','\\\\')
    inputString.replace('\a','\\a')
    inputString.replace('\b','\\b')
    inputString.replace('\f','\\f')
    inputString.replace('\n','\\n')
    inputString.replace('\r','\\r')
    inputString.replace('\t','\\t')
    inputString.replace('\v','\\v')
    return inputString

strInputProcessed = doubleBackslash(strInput)

I'd like to get:

lineList = strInputProcessed.splitlines()

>> ['signalArr(0)="ASCB D\axx\bxx\fxx\nxx"','signalArr(1)="root\rxx\txx\vxx"']

What I got:

>> ['signalArr(0)="ASCB D\x07xx\x08xx', 'xx', 'xx"', 'signalArr(1)="root', 'xx\txx', 'xx"']
Tehuan
  • 71
  • 1
  • 6

1 Answers1

6

Try storing your input as a raw string, then all '\n' characters will automatically be escaped:

>>> var = r'''abc\n
... cba'''
>>> print var
abc\n
cba
>>> var.splitlines()
['abc\\n', 'bca']

(Note the r before the '. This denotes the string is raw)

As an extra, if you wish to escape an existing string, instead of the replace commands you did above, you can use encode with 'string-escape'.

>>> s = 'abc\nabc\nabc'
>>> s.encode('string-escape')
'abc\\nabc\\nabc'

and similarly if needed, you can undo the string-escaping of a string.

>>> s.decode('string-escape')

Finally, thought I would add in your context:

>>> strInput = r'''signalArr(0)="ASCB D\axx\bxx\fxx\nxx"
... signalArr(1)="root\rxx\txx\vxx"'''
>>> strInput.splitlines()
['signalArr(0)="ASCB D\\axx\\bxx\\fxx\\nxx"', 'signalArr(1)="root\\rxx\\txx\\vxx"']

Even though the extra \ are present on the printed string, they don't really exist in memory. Iterating the string will prove this, as it does not give you an extra \ character that is used to escape.

>>> s = r'\a\b\c'
>>>
>>> for c in s:
...  print c
\
a
\
b
\
c
>>> list(s)
['\\', 'a', '\\', 'b', '\\', 'c']
J2C
  • 497
  • 4
  • 8