How to remove tabs and newlines with a regex

Question

In Python 3.x, the special re sequence '\s' matches Unicode whitespace characters including [ \t\n\r\f\v].

The following piece of code is intended to replace tabs and newlines with a space.

import re
text = """Hello my friends.
    How are you doing?
I'm fine."""
output = re.sub('\s', ' ', text)
print(output)

However, the tab is still present in output. Why?

Are you sure the "tab" isn't just a bunch of spaces? Most (if not all) IDEs replace a tab with four spaces. Use `\t` for a tab, and it will work. — Volatility, May 03 '13 at 09:33
...but the point of my question is, '\s' is supposed to include ' ', '\n' and '\t' — Marcos Gonzalez, May 03 '13 at 09:34
Yes, but it will replace each whitespace character with a space. A group of spaces will remain a group of spaces. Use `r'\s+'` instead if you want to replace a group of whitespace characters with a single whitespace. — Volatility, May 03 '13 at 09:34
@user1975053 We can't know. What is here on SO is a bunch of spaces. Check if you can go from `How` to the beginning of the line by pressing 4 times the left arrow, if you can, then these are spaces. — jadkik94, May 03 '13 at 09:38
Do you want tabs to be replaced by a single space, or something like 4 spaces? — Vishal, May 03 '13 at 09:40
Oh, the question input window does not seem to accept tabs! If I press on the tabulator I get thrown out(?) — Marcos Gonzalez, May 03 '13 at 09:42

score 19 · Accepted Answer · answered May 03 '13 at 09:38

19

The problem is(likely) that your tab character is just a bunch of spaces.

>>> re.sub(r"\s+", " ", text)
"Hello my friends. How are you doing? I'm fine."

answered May 03 '13 at 09:38

Nolen Royalty

It indeed is in my question, but it isn't in my original code. How can you enter a tab in a SO question? – Marcos Gonzalez May 03 '13 at 09:45
1

@user1975053 other than expressing a tab as "\t" I don't believe you can. So the best you could do with your question is change your string to "Hello my friends.\n\tHow are you doing?\nI'm fine." – Nolen Royalty May 03 '13 at 09:46
1

Thanks a gizillion. Found a lot of answers to remove **all** white spaces in a text making the text useless for NLP. This is the only one that preserves white spaces between words removing only extra white spaces. Oh and it also preserves German Umlauts. :-) – Simone Jan 12 '23 at 09:01

1 Answers1