0

I have a text file where lines are trimmed by newline characters /n and paragraphs by double newlines /n/n

I want to strip out those single newlines and replace with simple spaces. But I do not want the double newlines affected.

I thought something like one of these would work:

(?!\n\n)\n

\n{1}

\n{1,1}

But no luck. Everything I try inevitably ends up affecting those double new lines too. How can I write a regex that effectively "ignores" the /n/n but captures the /n

colorful-shirts
  • 238
  • 1
  • 8

3 Answers3

2

You can search using this regex:

(.)\n(?!\n)

And replace it with:

"\1 "

RegEx Demo

RegEx Breakup:

  • .\n: Match any character followed by a line break
  • (?!\n): Negative lookahead to assert that we don't have a line break at next position. We match one character before matching \n to make sure we don't match an empty line. Also note that this character is being captured in capture group #1. This will match all single line breaks but will skip double line breaks.
  • \1 : is replacement to append a space after first capture group

Python Code:

import re

repl = re.sub('(.)\n(?!\n)', r'\1 ', input)

print (repl)

Javscript Code:

repl = input.replace(/(.)\n(?!\n)/g, '$1 ')

console.log (repl)
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    That did it! I used $1 for my specific replacement, but after that it worked. I never thought of checking for the character *before* a single line break. I was too focused on the line break itself. Great and simple solution, thank you. – colorful-shirts Aug 31 '22 at 08:43
0

You'll need a negative lookahead and a negative lookbehind. /(?<!\n)\n(?!\n)/g would probably work off the top of my head.

That said, you should be aware of kind of spotty browser support for lookbehinds. It's gotten better since I last checked, but Safari and IE don't support it at all.

blhylton
  • 699
  • 4
  • 16
-1

I thought of a simple way to do this.(may not be the right way from a regex point of view) but its a workaround.

import re
sample = """This is a sentence in para1.

this is also a sentence in para1


The begining of paragraph2 and sentence1

this is a second line in paragraph2.
"""
print(sample)
sample = re.sub(r'\n\n\n',"NPtag",sample)
sample = re.sub(r'\n\n'," ",sample)
sample = re.sub(r"NPtag",'\n\n\n',sample)
print("OUTPUT*****\n")
print(sample)

the workaround is to replace the multi-line(3 in this case to demonstrate the space clearly) breaker with a NewParagraphtag(NPtag) and then substitute the single newline(2 in the above case, to demonstrate the sapce clearly in notebook env) with space and resubstitute the NPtag with multiline break. You can see the output here as:enter image description here

Hope this helps. Eager to see other regex answers too! Happy coding