0

I know the classic way of dealing with linebreaks, tabs,.. is to .strip() or .remove('\n',''). But sometimes there are special cases in which these methods fail, e.g.

         'H\xf6cke\n\n:\n\nDie'.strip()

  gives: 'H\xf6cke\n\n:\n\nDie'

How can I catch these rare cases which would have to be covered one by one (e.g. by .remove('*', '')? The above is just one example I came across.

oeb
  • 189
  • 1
  • 12

3 Answers3

4
In [1]: import re

In [2]: text = 'H\xf6cke\n\n:\n\nDie'

In [3]: re.sub(r'\s+', '', text)
Out[3]: 'Höcke:Die'

\s:

Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \t\n\r\f\v] is matched (but the flag affects the entire regular expression, so in such cases using an explicit [ \t\n\r\f\v] may be a better choice).

'+'

Causes the resulting RE to match 1 or more repetitions of the preceding RE.

宏杰李
  • 11,820
  • 2
  • 28
  • 35
  • 1
    Great answer, because it takes care of many different forms of formatting that are reliably removed instead of adding a .remove('\*', '') for every type. – oeb Jan 27 '17 at 14:45
0

Use replace if you dont want to import anything

a = "H\xf6cke\n\n:\n\nDie"
print(a.replace("\n",""))

# Höcke:Die
Gábor Erdős
  • 3,599
  • 4
  • 24
  • 56
0

Strip's documentation:
Return a copy of the string S with leading and trailing whitespace removed. If chars is given and not None, remove characters in chars instead.

That's why it didn't remove the '\n' within the text.

If you want to remove the '\n' occurrences you can use

'H\xf6cke\n\n:\n\nDie'.replace('\n','')
Output: Höcke:Die
Maya G
  • 170
  • 10