-1

I got some text extracted and wish to clean it up by RegEx.

I have learned basic RegEx but not sure how to build this one:

str = '''
this is 
a line that has been cut.
This is a line that should start on a new line
'''

should be converted to this:

str = '''
this is a line that has been cut.
This is a line that should start on a new line
'''

This r'\w\n\w' seems to catch it, but not sure how to replace the new line with space and not touch the end and beginning of words

Norfeldt
  • 8,272
  • 23
  • 96
  • 152

1 Answers1

5

You can use this lookbehind regex for re.sub:

>>> str = '''
... this is
... a line that has been cut.
... This is a line that should start on a new line
... '''
>>> print re.sub(r'(?<!\.)\n', '', str)
this is a line that has been cut.
This is a line that should start on a new line
>>>

RegEx Demo

(?<!\.)\n matches all line breaks that are not preceded by a dot.

If you don't want a match based on presence of dot then use:

re.sub(r'(?<=\w\s)\n', '', str)

RegEx Demo 2

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • hmm.. can't make it work for the case I have https://repl.it/@Norfeldt/SuperficialCumbersomeTenrec – Norfeldt Dec 05 '17 at 11:56
  • That link doesn't even show what is original string. Also note I suggested `r'(?<!\.)\n'` but you have `\w` also in there. – anubhava Dec 05 '17 at 11:59
  • 1
    I know.. sorry.. I thought that my example would cover my use case. It didn't. I had to add `\w`, else it would add some weird places.. – Norfeldt Dec 05 '17 at 12:03