-2

I want to split a text file into paragraphs, separated by 1 or more empty lines. For example:

# file.txt
"Paragraph1
Some text

Paragraph2
More text

Paragraph3
some more text"

I tried using regex, but I'm not sure if I'm doing it correctly. In the example I'm trying to print the second paragraph only, but I get a list index out of range error. But when I print p[0] it prints the whole text file. What am I doing wrong? Should I use a different regex expression? Or other methods to split the file into paragraphs?

with open(file) as f:
    text = f.read()

p = text.split("[\r\n]+")
print(p[1])
nrse_i
  • 17
  • 5

3 Answers3

0

You have an error because you didn't split your text (and so, don't have a second element), instead you could use this separator:

p = text.split("\n\n")
0

Use re.split()

>>> import re
>>> re.split(r'[\r\n][\r\n]+', text)
['Paragraph1\nSome text', 'Pragraph2\nMore text', 'Paragraph3\nsome more text']
Prem Anand
  • 2,469
  • 16
  • 16
0

Try adding a single space in the texts using the below.

import re

fin = open("data.txt", "rt") fout = open("out.txt", "wt")

for line in fin: fout.write(re.sub('\s+',' ',line))

fin.close() fout.close()