I need to extract text between two expressions (beginning & end) from a textfile (the beginning and the end of a letter, which is embedded in a larger file). The problem that I face is that there are multiple potential expressions for both, the beginning and the end of the letter.
I have a list of expressions, which potentially qualify as beginning / end expressions. I need to extract all text between any combination of those expressions from a larger text (including beginning and end expression) and write it to a new file.
sample_text = """Some random text
asdasd
asdasd
asdasd
**Dear my friend,
this is the text I want to extract.
Sincerly,
David**
some other random text
adasdsasd"""
My code so far:
letter_begin = ["dear", "to our", "estimated", ...]
letter_end = ["sincerly", "yours", "best regards", ...]
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "dear": #shortcomming: only 1 Expression possible here
copy = True
elif line.strip() == "sincerly": #shortcomming: only 1 Expression possible here
copy = False
elif copy:
outfile.write(line)
The above example includes "Dear" as letter_begin expression and "Sincerly" as letter_end expression. I need to have a flexible code, which is able to catch any beginning and ending letter expression from the above lists (any potential combination of the expressions; e.g. "Dear [...] rest regards" or "Estimated [...] Sincerly")