How to split "\t" in a string to two separate characters as "\" and "t"? (How to split Escape Sequence?)

Question

I am trying to split a string in python into a list of characters. I know that there are a lot of ways to do this in python, but I have a case where those methods don't give me the desired results.

The problem happens when I have special characters like '\t' that is explicitly written in the string (and I don't mean the real tab).

Example:

string = "    Hello \t World."

the output I need is:

list_of_chars = [' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

but when I use the methods that are given in this question, I get a list that contains '/t' as whole string - not separated.

Example:

> list(string)
> ['H', 'e', 'l', 'l', 'o', 'w', ' ', '\t', ' ', 'W', 'o', 'r', 'l', 'd', '.']

I want to know why this happens and how to get what I want.

`string = [x for x in r" Hello \t World."]` is the closest you'll get. — Abdou, Jan 01 '18 at 19:42
If you typed `" Hello \t World."` in Python, this is the real tab. A string containing backslash-t would be either `r" Hello \t World."` or `" Hello \\t World."`. Do you have the string in the code or are you reading a file?... — dividebyzero, Jan 01 '18 at 19:46
@dividebyzero I am reading a source code file of a high-level like language, and it have strings like the one I mentioned. — atefsawaed, Jan 01 '18 at 19:51

Patrick Artner · Accepted Answer · 2018-01-01T20:26:31.693

You can substitute your string accordingly:

import itertools
txt = "    Hello \t World."

specials = { 
    '\a' : '\\a', #     ASCII Bell (BEL)
    '\b' : '\\b', #     ASCII Backspace (BS)
    '\f' : '\\f', #     ASCII Formfeed (FF)
    '\n' : '\\n', #     ASCII Linefeed (LF)
    '\r' : '\\r', #     ASCII Carriage Return (CR)
    '\t' : '\\t', #     ASCII Horizontal Tab (TAB)
    '\v' : '\\v'  #     ASCII Vertical Tab (VT)
}

# edited out: # txt2 = "".join([x if x not in specials else specials[x] for x in txt])
txt2 = itertools.chain(* [(list(specials[x]) if x in specials else [x]) for x in txt])

print(list(txt2))

Output:

[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 
 'o', 'r', 'l', 'd', '.']

The list comprehension looks more "positive" and uses list(itertools.chain(*[...])) instead of list("".join([...])) which should be more performant.

@vaultah Cool. Only new about `[key]` and `.keys` - a `.get()` with a default - should have thought it would be provided. Found it here: https://docs.python.org/3/library/stdtypes.html#dict. I read about the ["flatten inner lists"](https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python) syntax but cant get my head around it. Thanks for commenting - gonna read up on translate next. — Patrick Artner, Jan 02 '18 at 09:10

Moinuddin Quadri · Answer 2 · 2018-01-01T20:17:19.593

You should take a look at String Literal document, which says:

The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter r' orR'; such strings are called raw strings and use different rules for backslash escape sequences.

In your example string, \t are not two characters but a single character which represents ASCII Horizontal Tab (TAB).

In order to tell your Python interpreter that these two are separate character, you should be using raw string (using r before string "")as:

>>> list(r"    Hello \t World.")
[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

But here also you'll see two \\ in the resultant list, which is just a Python's way of representing \.

For Python interpreter '\' is an invalid string because \' in a string represent Single quote ('). Hence, when you do '\', it raises below error because for Python there is no end quote present in the string:

>>> '\'
  File "<stdin>", line 1
    '\'
      ^
SyntaxError: EOL while scanning string literal

If you can't declare your string as raw string (as it's already defined or imported from some other source), you may convert it to byte string by setting encoding as "unicode-escape":

>>> my_str = "    Hello \t World."

>>> unicode_escaped_string = my_str.encode('unicode-escape')
>>> unicode_escaped_string
b'    Hello \\t World.'

Since it is a byte-string, you need to call chr to get the corresponding character value of each byte. For example:

>>> list(map(chr, unicode_escaped_string))
[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

Savir · Answer 3 · 2018-01-01T20:54:54.490

You could maybe convert to a Python's literal string and then split character by character?

string = "    Hello \t World."
string_raw = string.encode('unicode-escape')
print([ch for ch in string_raw])
print([chr(ch) for ch in string_raw])

Outputs:

[32, 32, 32, 32, 72, 101, 108, 108, 111, 32, 92, 116, 32, 87, 111, 114, 108, 100, 46]
[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

The Ascii 92 is a single backlash, even though when you print it in a terminal, it'll show it escaped.

score -1 · Answer 4 · answered Jan 01 '18 at 19:41

-1

\t means tab, if you want to explicitely have a \ character, you'll need to escape it in your string:

string = "    Hello \\t World."

Or use a raw string:

string = r"    Hello \t World."

answered Jan 01 '18 at 19:41

Axnyff

9,213
4
33
37

How to split "\t" in a string to two separate characters as "\" and "t"? (How to split Escape Sequence?)

4 Answers4