10

In Python 3.x, the special re sequence '\s' matches Unicode whitespace characters including [ \t\n\r\f\v].

The following piece of code is intended to replace tabs and newlines with a space.

import re
text = """Hello my friends.
    How are you doing?
I'm fine."""
output = re.sub('\s', ' ', text)
print(output)

However, the tab is still present in output. Why?

Marcos Gonzalez
  • 1,086
  • 2
  • 12
  • 19
  • 3
    Are you sure the "tab" isn't just a bunch of spaces? Most (if not all) IDEs replace a tab with four spaces. Use `\t` for a tab, and it will work. – Volatility May 03 '13 at 09:33
  • ...but the point of my question is, '\s' is supposed to include ' ', '\n' and '\t' – Marcos Gonzalez May 03 '13 at 09:34
  • 4
    Yes, but it will replace each whitespace character with a space. A group of spaces will remain a group of spaces. Use `r'\s+'` instead if you want to replace a group of whitespace characters with a single whitespace. – Volatility May 03 '13 at 09:34
  • @user1975053 We can't know. What is here on SO is a bunch of spaces. Check if you can go from `How` to the beginning of the line by pressing 4 times the left arrow, if you can, then these are spaces. – jadkik94 May 03 '13 at 09:38
  • Do you want tabs to be replaced by a single space, or something like 4 spaces? – Vishal May 03 '13 at 09:40
  • Oh, the question input window does not seem to accept tabs! If I press on the tabulator I get thrown out(?) – Marcos Gonzalez May 03 '13 at 09:42

1 Answers1

19

The problem is(likely) that your tab character is just a bunch of spaces.

>>> re.sub(r"\s+", " ", text)
"Hello my friends. How are you doing? I'm fine."
Nolen Royalty
  • 18,415
  • 4
  • 40
  • 50
  • It indeed is in my question, but it isn't in my original code. How can you enter a tab in a SO question? – Marcos Gonzalez May 03 '13 at 09:45
  • 1
    @user1975053 other than expressing a tab as "\t" I don't believe you can. So the best you could do with your question is change your string to "Hello my friends.\n\tHow are you doing?\nI'm fine." – Nolen Royalty May 03 '13 at 09:46
  • 1
    Thanks a gizillion. Found a lot of answers to remove **all** white spaces in a text making the text useless for NLP. This is the only one that preserves white spaces between words removing only extra white spaces. Oh and it also preserves German Umlauts. :-) – Simone Jan 12 '23 at 09:01