Python: Replacing backslashes to avoid escape sequences in string

Question

I´m trying to replace the single backslashes i get within a string with double backslashes, because sometimes the the "backslash+character" combination creates an escape sequence. I have tried various ways (mostly from other stackoverflow questions), but nothing gets me the correct results so far.

Example s = "\aa, \bb, \cc, \dd"

string.replace(s,"\\","\\\\")

replaces the first a and b with special characters (can´t get pasting the exact result here to work?):

@a,@b,\\cc,\\dd

print s.encode("string_escape")

produces

\x07a,\x08b,\\cc,\\dd

(same for "unicode-escape")

using this function

escape_dict={'\a':r'\a',
           '\b':r'\b',
           '\c':r'\c',
           '\f':r'\f',
           '\n':r'\n',
           '\r':r'\r',
           '\t':r'\t',
           '\v':r'\v',
           '\'':r'\'',
           '\"':r'\"',
           '\0':r'\0',
           '\1':r'\1',
           '\2':r'\2',
           '\3':r'\3',
           '\4':r'\4',
           '\5':r'\5',
           '\6':r'\6',
           '\7':r'\7',
           '\8':r'\8',
           '\9':r'\9'}

def raw(text):
    """Returns a raw string representation of text"""
    new_string=''
    for char in text:
        try: new_string+=escape_dict[char]
        except KeyError: new_string+=char
    return new_string

produces

\7a,\bb,\cc,\dd

and using this function

import re
import codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\U........      # 8-digit hex escapes
    | \\u....          # 4-digit hex escapes
    | \\x..            # 2-digit hex escapes
    | \\[0-7]{1,3}     # Octal escapes
    | \\N\{[^}]+\}     # Unicode characters by name
    | \\[\\'"abfnrtv]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')

    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

returns the string with special characters again

 @a,@b,\\cc,\\dd

The actual strings i need to convert would be something like "GroupA\Group2\Layer1"

If you have this problem of problems with escape sequences in your input data, you should actually get that data fixed at the source. — Klaus D., Feb 19 '16 at 11:05
thanks, i´m no sure (how) i can change the input, as it´s the string-output from a tool parameter (value table in arcgis) and i need the string representation. my question is more or less, why won´t any of these methods above work on a general example? — rr5577, Feb 19 '16 at 11:36
The question does not make sense as asked. If you "are getting" strings from an outside source, then it either **actually contains** a backslash followed by whatever, or else it **actually contains** a special character. It cannot contain an "escape sequence"; those are **only** relevant to *string literals in your code*. If your input is **supposed to** contain e.g. backslash followed by lowercase a, but **actually** contains a BEL character, then you will have to replace it manually, following your own heuristics. — Karl Knechtel, Aug 08 '22 at 01:48
There is **no way to know in general** what the "unescaped" version should be, because there are *multiple possible* string literals for any given string (except the empty string, assuming you don't care which quotes are used). `a = '\x20'` and `a = ' '` cause `a` to have **the same value**. Suppose you know that someone else's broken code will take in an input of backslash, lowercase x, two, zero, and inappropriately convert it to a space; suppose you receive a space from that process. It is **not possible to tell** whether that conversion happened or not. Information was lost. — Karl Knechtel, Aug 08 '22 at 01:51
"replaces the first a and b with special characters" No, it does not. They **already were** special characters. — Karl Knechtel, Aug 08 '22 at 01:53

score 2 · Accepted Answer · edited May 23 '17 at 10:28

2

In general I agree with Klaus's comment. Though that's not always a possibility.

The quick answer is that you can do this: r'\aa, \bb, \cc, \dd'.

I found more information here.

The less happy answer if that isn't a possibility is that you do your replacements as such:

s = '\aa, \bb, \cc, \dd'
string.replace(s,"\x07","\\a")

edited May 23 '17 at 10:28

Community

1
1

answered Feb 19 '16 at 11:18

unflores

1,764
2
15
35

thanks, i think i can use the "less happy answer" as a workaround for the moment. – rr5577 Feb 19 '16 at 12:17

Python: Replacing backslashes to avoid escape sequences in string

1 Answers1