How can I convert special characters in a string back into escape sequences?

Question

Suppose I have a string like 'a\tb'. If I print it, I will see a b. But I want to see a\tb instead. How can I convert my string so it will print like that?

For the opposite (converting from backslash-escape sequences in the text to the corresponding data), see [Process escape sequences in a string in Python](/questions/4020539). — Karl Knechtel, Aug 07 '22 at 09:38

score 93 · Answer 1 · edited Sep 03 '18 at 05:33

93

print(repr('a\tb'))

repr() gives you the "representation" of the string rather than the printing the string directly.

edited Sep 03 '18 at 05:33

Boris Verkhovskiy

14,854
11
100
103

answered Oct 23 '14 at 02:02

John Zwinck

239,568
38
324
436

1

This will output the single quotes, which is ``'a\tb'``, not ``a\tb``. – wyz23x2 May 26 '20 at 04:09
2

@wyz23x2: That's true, you need to remove the quotes if you don't want them. – John Zwinck May 26 '20 at 05:06
If the single quotes are not desirable then just strip them away like `print(repr('a\tb').strip("''")` – Tanner Dolby Jul 23 '21 at 19:54

wyz23x2 · Answer 2 · 2021-05-07T14:29:44.553

5

1:

(Python 2)

print ur'a\tb'

Note that in Python 3.x, u'' is equal to '', and the prefix ur is invalid.
Python 3:

print(r'a\tb')

2:

(Python 3)

print('a\\tb')

3:

If you want to get the raw repr of an existing string, here is a small function: (Python 3.6+)

def raw(string: str, replace: bool = False) -> str:
    """Returns the raw representation of a string. If replace is true, replace a single backslash's repr \\ with \."""
    r = repr(string)[1:-1]  # Strip the quotes from representation
    if replace:
        r = r.replace('\\\\', '\\')
    return r

Examples:

>>> print(raw('1234'))
1234
>>> print('\t\n'); print('='*10); print(raw('\t\n'))
    

==========
\t\n
>>> print(raw('\r\\3'))
\r\\3
>>> print(raw('\r\\3', True))
\r\3

Note this won't work for \N{...} Unicode escapes, only r'\N{...}' can. But I guess JSON doesn't have this :)

>>> print(raw('\N{PLUS SIGN}'))
+

edited May 07 '21 at 14:29

answered Apr 27 '20 at 04:20

wyz23x2

298
4
16

"Note this won't work for \N{...} Unicode escapes" Well, yes; just like it doesn't let you get `'\11'` nor `'\x09'` for a string that contains just a tab, but only `'\t'`. There are multiple possible representations for any given string, but `repr` can only possibly give back one of them. – Karl Knechtel Aug 07 '22 at 09:50
"If replace is true, replace a single backslash's repr \\ with \" This is dangerous functionality to add. For example, `>>> print(raw('\\n', True))` will print `\n`. String literals like that are [deprecated](/q/52335970/) anyway. – Karl Knechtel Aug 08 '22 at 03:20

score -3 · Answer 3 · answered Aug 08 '22 at 04:09

Please beware that the problem is underspecified in general. The built-in repr function will give you the canonical representation of the string as a string literal, as shown in the other answers. However, that may not be what you want.

For every string except the empty string, there are multiple ways of specifying the string contents. Even something as simple as ' ' could be written as '\x20', or '\u0020', or '\U00000020'. All of these are the same string (and that's ignoring the choice of enclosing quotes).

Python's choices are not always what you might expect. Newlines will be represented as '\n', but backspace characters for example will be represented as a hex code, not as '\b'. On the other hand, even really fancy characters like emoji may very well be included literally, not escaped.

If you want to change that behaviour, you will have to write it yourself, after defining the specific rules you want to apply. One useful tool for this is the translate method of strings, which can simply apply a mapping to each Unicode code point in the input. The string classmethod str.maketrans can help with creating such a mapping, but that's still underpowered - you're stuck giving a specific set of code points to translate (and their translations), and then everything not specified is left alone.

If you want to convert large amounts of code points in a way that follows some kind of pattern, you might write a function for that. However, handling special cases, or multiple separate blocks of Unicode code points, could end up with a lot of tedious branching logic.

Here is my attempt to get the best of both worlds. We create a dict subclass that implements __missing__ to dispatch to a function handling an appropriate character range, and caches the result. We initialize it with an iterable or mapping of hard-coded values (per the base dict constructor), *args that give (function, range) pairs (the function will be used to compute the result for characters with numeric values falling in the range), and **kwargs (again per the base dict constructor). We will accept Unicode characters as the keys, although translate will pass the numeric code point values; so we also need to translate in the constructor.

class strtrans(dict):
    def __init__(self, iterable_or_mapping, *args, **kwargs):
        self._handlers = args
        temp = dict(iterable_or_mapping, **kwargs)
        super().__init__({ord(k): v for k, v in temp.items()})
    def __missing__(self, value):
        self[value] = value # if no handler, leave the character untouched
        for func, r in self._handlers:
            if value in r:
                self[value] = func(value)
                break
        return self[value] # not missing any more.

Let's test it:

>>> hardcoded = {'\n': '\\n', '\t': '\\t', '\b': '\\b'}
>>> # Using the `.format` method bound to a string is a quick way
>>> # to get a callable that accepts the input number and formats it in.
>>> # For uppercase, use :02X instead of :02x etc.
>>> backslash_x = ('\\x{:02x}'.format, range(256))
>>> backslash_u = ('\\u{:04x}'.format, range(256, 65536))
>>> backslash_U = ('\\U{:08x}'.format, range(65536, 0x110000))
>>> mapping = strtrans(hardcoded, backslash_x, backslash_u, backslash_U)
>>> test = '\n\t\b\x01\u4EBA\U0001F60A'
>>> print(test.translate(mapping)) # custom behaviour - note lowercase output
\n\t\b\x01\u4eba\U0001f60a
>>> print(repr(test)) # canonical representation, with enclosing quotes
'\n\t\x08\x01人'
>>> print(test) # your terminal's rendering may vary!

       人

How can I convert special characters in a string back into escape sequences?

3 Answers3

1:

2:

3:

Linked

Related