2

I´m trying to replace the single backslashes i get within a string with double backslashes, because sometimes the the "backslash+character" combination creates an escape sequence. I have tried various ways (mostly from other stackoverflow questions), but nothing gets me the correct results so far.

Example s = "\aa, \bb, \cc, \dd"


string.replace(s,"\\","\\\\")

replaces the first a and b with special characters (can´t get pasting the exact result here to work?):

@a,@b,\\cc,\\dd

print s.encode("string_escape")

produces

\x07a,\x08b,\\cc,\\dd

(same for "unicode-escape")


using this function

escape_dict={'\a':r'\a',
           '\b':r'\b',
           '\c':r'\c',
           '\f':r'\f',
           '\n':r'\n',
           '\r':r'\r',
           '\t':r'\t',
           '\v':r'\v',
           '\'':r'\'',
           '\"':r'\"',
           '\0':r'\0',
           '\1':r'\1',
           '\2':r'\2',
           '\3':r'\3',
           '\4':r'\4',
           '\5':r'\5',
           '\6':r'\6',
           '\7':r'\7',
           '\8':r'\8',
           '\9':r'\9'}

def raw(text):
    """Returns a raw string representation of text"""
    new_string=''
    for char in text:
        try: new_string+=escape_dict[char]
        except KeyError: new_string+=char
    return new_string

produces

\7a,\bb,\cc,\dd

and using this function

import re
import codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\U........      # 8-digit hex escapes
    | \\u....          # 4-digit hex escapes
    | \\x..            # 2-digit hex escapes
    | \\[0-7]{1,3}     # Octal escapes
    | \\N\{[^}]+\}     # Unicode characters by name
    | \\[\\'"abfnrtv]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')

    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

returns the string with special characters again

 @a,@b,\\cc,\\dd

The actual strings i need to convert would be something like "GroupA\Group2\Layer1"

Community
  • 1
  • 1
rr5577
  • 175
  • 1
  • 5
  • If you have this problem of problems with escape sequences in your input data, you should actually get that data fixed at the source. – Klaus D. Feb 19 '16 at 11:05
  • thanks, i´m no sure (how) i can change the input, as it´s the string-output from a tool parameter (value table in arcgis) and i need the string representation. my question is more or less, why won´t any of these methods above work on a general example? – rr5577 Feb 19 '16 at 11:36
  • The question does not make sense as asked. If you "are getting" strings from an outside source, then it either **actually contains** a backslash followed by whatever, or else it **actually contains** a special character. It cannot contain an "escape sequence"; those are **only** relevant to *string literals in your code*. If your input is **supposed to** contain e.g. backslash followed by lowercase a, but **actually** contains a BEL character, then you will have to replace it manually, following your own heuristics. – Karl Knechtel Aug 08 '22 at 01:48
  • There is **no way to know in general** what the "unescaped" version should be, because there are *multiple possible* string literals for any given string (except the empty string, assuming you don't care which quotes are used). `a = '\x20'` and `a = ' '` cause `a` to have **the same value**. Suppose you know that someone else's broken code will take in an input of backslash, lowercase x, two, zero, and inappropriately convert it to a space; suppose you receive a space from that process. It is **not possible to tell** whether that conversion happened or not. Information was lost. – Karl Knechtel Aug 08 '22 at 01:51
  • "replaces the first a and b with special characters" No, it does not. They **already were** special characters. – Karl Knechtel Aug 08 '22 at 01:53

1 Answers1

2

In general I agree with Klaus's comment. Though that's not always a possibility.

The quick answer is that you can do this: r'\aa, \bb, \cc, \dd'.

I found more information here.

The less happy answer if that isn't a possibility is that you do your replacements as such:

s = '\aa, \bb, \cc, \dd'
string.replace(s,"\x07","\\a")
Community
  • 1
  • 1
unflores
  • 1,764
  • 2
  • 15
  • 35