4

I'm trying to find a regular exp that enables me to replace all the line breaks and tabs (\n, \r, \t, etc.), and also any spaces before, after and inbetween by a single space. For example, the string

'Copyright ©\n\t\t\t\n\t\t\t2019\n\t\t\tApple Inc. All rights reserved.'

should turn into

'Copyright © 2019 Apple Inc. All rights reserved.'

Also, in the case that the original string was:

'Copyright © \n\t \t\t\n \t\t\t2019\n\t\t\t Apple Inc. All rights reserved.'

The final result should be the same.

For a single line break, in the most simple case where there were no additional spaces, it would be something like

re.sub(r"\n", " ", html)

But as I don't deal often with regular expressions I don't know how to solve what I'm after.

Filipe Aleixo
  • 3,924
  • 3
  • 41
  • 74
  • You don't need a regular expression for this. `' '.join('Copyright ©\n\t\t\t\n\t\t\t2019\n\t\t\tApple Inc. All rights reserved.'.split())` will give you the output you want. – BoarGules Mar 12 '19 at 14:40

1 Answers1

14

Try using \s, which matches all whitespace characters.

>>> import re
>>> s = 'Copyright ©\n\t\t\t\n\t\t\t2019\n\t\t\tApple Inc. All rights reserved.'
>>> s = re.sub("\s+", " ", s)
>>> s
'Copyright © 2019 Apple Inc. All rights reserved.'
Kevin
  • 74,910
  • 12
  • 133
  • 166
  • You can even do without the '+' after '\s' – SBylemans Mar 12 '19 at 14:23
  • 1
    `re.sub("\s", " ", s)` would certainly replace all tabs/newlines/etc with spaces. But it would replace consecutive whitespace characters with the same number of spaces. If you want "\t\t\t" to become a single space, for example, then `re.sub("\s+", " ", s)` is preferable. – Kevin Mar 12 '19 at 14:26