I am trying to parse certain paragraphs out of multiple text file and store them in list. All the text file have some similar format to this:
MODEL NUMBER: A123
MODEL INFORMATION: some info about the model
DESCRIPTION: This will be a description of the Model. It
could be multiple lines but an empty line at the end of each.
CONCLUSION: Sold a lot really profitable.
Now i can pull out the information where its one line, but am having trouble when i encounter something which is multiple line (like 'Description'). The description length is not known but i know at the end it would have an empty line (which would mean using '\n'). This is what i have so far:
import os
dir = 'Test'
DESCRIPTION = []
for files in os.listdir(dir):
if files.endswith('.txt'):
with open(dir + '/' + files) as File:
reading = File.readlines()
for num, line in enumerate(reading):
if 'DESCRIPTION:' in line:
Start_line = num
if len(line.strip()) == 0:
I don't know if its the best approach, but what i was trying to do with if len(line.strip()) == 0:
is to create a list of blank lines and then find the first greater value than Start_Line
. I saw this Bisect.
In the end i would like my data to be if i say print Description
['DESCRIPTION: Description from file 1',
'DESCRIPTION: Description from file 2',
'DESCRIPTION: Description from file 3,]
Thanks.