i have .docx files in a directory and i want to get all text between two paragraphs.
Example:
Foo :
The foo is not easy, but we have to do it. We are looking for new things in our ad libitum way of life.
Bar :
I want to get :
The foo is not easy, but we have to do it.
We are looking for new things in our ad libitum way of life.
I wrote this code :
import docx
import pathlib
import glob
import re
def rf(f1):
reader = docx.Document(f1)
alltext = []
for p in reader.paragraphs:
alltext.append(p.text)
return '\n'.join(alltext)
for f in docxfiles:
try:
fulltext = rf(f)
testf = re.findall(r'Foo\s*:(.*)\s*Bar', fulltext, re.DOTALL)
print(testf)
except IOError:
print('Error opening',f)
it returns None
What am I doing wrong ?