Regex Python - Replace any combination of line breaks, tabs, spaces, by single space

Question

I'm trying to find a regular exp that enables me to replace all the line breaks and tabs (\n, \r, \t, etc.), and also any spaces before, after and inbetween by a single space. For example, the string

'Copyright ©\n\t\t\t\n\t\t\t2019\n\t\t\tApple Inc. All rights reserved.'

should turn into

'Copyright © 2019 Apple Inc. All rights reserved.'

Also, in the case that the original string was:

'Copyright © \n\t \t\t\n \t\t\t2019\n\t\t\t Apple Inc. All rights reserved.'

The final result should be the same.

For a single line break, in the most simple case where there were no additional spaces, it would be something like

re.sub(r"\n", " ", html)

But as I don't deal often with regular expressions I don't know how to solve what I'm after.

You don't need a regular expression for this. `' '.join('Copyright ©\n\t\t\t\n\t\t\t2019\n\t\t\tApple Inc. All rights reserved.'.split())` will give you the output you want. — BoarGules, Mar 12 '19 at 14:40

score 14 · Accepted Answer · answered Mar 12 '19 at 14:19

14

Try using \s, which matches all whitespace characters.

>>> import re
>>> s = 'Copyright ©\n\t\t\t\n\t\t\t2019\n\t\t\tApple Inc. All rights reserved.'
>>> s = re.sub("\s+", " ", s)
>>> s
'Copyright © 2019 Apple Inc. All rights reserved.'

answered Mar 12 '19 at 14:19

Kevin

74,910
12
133
166

You can even do without the '+' after '\s' – SBylemans Mar 12 '19 at 14:23
1

`re.sub("\s", " ", s)` would certainly replace all tabs/newlines/etc with spaces. But it would replace consecutive whitespace characters with the same number of spaces. If you want "\t\t\t" to become a single space, for example, then `re.sub("\s+", " ", s)` is preferable. – Kevin Mar 12 '19 at 14:26

Regex Python - Replace any combination of line breaks, tabs, spaces, by single space

1 Answers1