6

I need a way for my function to take in a string at runtime and remove the backslashes while KEEPING the character it is prepended to. So for \a I must get a. This must also work for nonescaped characters like \e -> e.

I've scoured the internet looking for a general solution to this problem, but there does not appear to be one. The best solution I have found uses a dictionary to build the string from scratch like: How to prevent automatic escaping of special characters in Python

escape_dict={'\a':r'\a',
         '\b':r'\b',
         '\c':r'\c',
         '\f':r'\f',
         '\n':r'\n',
         '\r':r'\r',
         '\t':r'\t',
         '\v':r'\v',
         '\'':r'\'',
         '\"':r'\"',
         '\0':r'\0',
         '\1':r'\1',
         '\2':r'\2',
         '\3':r'\3',
         '\4':r'\4',
         '\5':r'\5',
         '\6':r'\6',
         '\7':r'\7',
         '\8':r'\8',
         '\9':r'\9'}
def raw(text):
    """Returns a raw string representation of the string"""
    new_string=''
    for char in text:
        try: 
            new_string += escape_dict[char]
        except KeyError: 
            new_string += char
    return new_string

However this fails in general because of conflicts between the escaped numbers and escaped letters. Using the 3 digit numbers like \001 instead of \1 also fails because the output will have additional numbers in it which defeats the purpose. I should simply remove the backslash. Other proposed solutions based on encodings like the one found here Process escape sequences in a string in Python

also does not work because this converts just converts the escape characters into the hex code. \a gets converted to \x07. Even if were to somehow remove this the character a is still lost.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mattreex
  • 189
  • 2
  • 17
  • Where do you get these strings from? Do you load them from a file, take from the user or something? – Szymon Bednorz Jul 13 '20 at 16:46
  • They are part of a pipeline that is generated by reading from a file. – Mattreex Jul 13 '20 at 17:38
  • 1
    Why isn’t `re.sub(r"\\(.)",r"\1",…)` all you need? Are you trying to *undo* erroneous interpretation of escape sequences rather than just avoiding it? – Davis Herring Jul 13 '20 at 21:29
  • 2
    I think it's not possible to achieve that in a way described in your question. For example, if you assign `"\001"` or `"\1"` to the string, the original information is processed and lost (`"\001"` -> `"\x01"` and also `"\1"` -> `"\x01"`), so we are unable to find the original string. Given that, the entire conversion should take place during data loading, so you have to provide more details about it. – Szymon Bednorz Jul 13 '20 at 21:41
  • @DavisHerring Correct. I need to undo erroneous interpretation. – Mattreex Jul 14 '20 at 15:54
  • @Mattreex: That’s not possible in general (as dsonyy demonstrated); it’s also *very* unclear from your question. Edit it and include some sort of criteria for a heuristic answer, since that’s all you can do. – Davis Herring Jul 14 '20 at 16:07
  • "also does not work because this converts just converts the escape characters into the hex code. \a gets converted to \x07. Even if were to somehow remove this the character a is still lost." This is not solvable, because `'\a'` and `'\x07'` **mean the same thing**, and if we are given that string, there is no way to decide whether to generate `'\\a'` or `'\\x07'`. You can apply hard-coded rules, but it's up to you to determine them. `repr` will prefer `'\x07'`. – Karl Knechtel Aug 07 '22 at 08:08

1 Answers1

0

There is a function you may want to use for this purpose called repr().

repr() computes the “official” string representation of an object (a representation that has all information about the object) and str() is used to compute the “informal” string representation of an object (a representation that is useful for printing the object).

Example:

s = 'This is a \t string tab. And this is a \n newline character'
print(s)  # This will print `s` with a tab and a newline inserted in the string
print(repr(s))  # This prints `s` as the original string with backslash and the whatever letter you have used
# So maybe you can use this somewhere
print(repr(s).replace('\\', '_'))
# And obviously this might not have worked for you
print(s.replace('\\', '_'))

So you can replace the backslash from your string by using repr(<your string>)

Dharman
  • 30,962
  • 25
  • 85
  • 135
prerakl123
  • 121
  • 1
  • 11