0

I'm trying to help my wife out with reviewing documents for work - the paragraphs of notes have different catagories that I am trying to extract as seperate strings to save to a different text file so that I can do other things to them later. An example paragraph is:

Observations of Client Behavior: Overall interfering behavior data trends are as followed: THIS IS THE DESIRED TEXT. Observations of Client's response to skill acquisition: Overall skill acquisition data trends ....

and Im trying to extract just the text between "Overall interfering behavior data trends are as followed:" to right before "Observations of Client's response to skill acquisition:"

I've experimented with regex with no success, any help in direction would be much appreciated, thanks!

SSerb1989
  • 81
  • 7
  • the text Overall interfering behavior does not appear in the example paragraph. please elaborate. – Hadar Oct 02 '21 at 11:59
  • Thats because I accidentally deleted it - oops. Ill edit it now, thank you! – SSerb1989 Oct 02 '21 at 12:00
  • not sure how structured the text is. for example, if there's only one sentence in the part which you wish to extract, and such sentence ends with a "." and starts with a ":", you could in theory use `text.split(":")[1].split(".")[0]` – Hadar Oct 02 '21 at 12:04

1 Answers1

1

Taken reference from this post Regular expression to return all characters between two special characters

import re

file = open("filename.txt", "r") # Insert the file name here

pat = r'.*?Overall interfering behavior data trends are as followed:(.*)Observations of Client\'s response to skill acquisition:.*'
match = re.search(pat, line)

for line in file:
    print(match.group(1).strip())

Gives output

'THIS IS THE DESIRED TEXT.'
Balaji
  • 795
  • 1
  • 2
  • 10
  • This is a great step! I'm trying to pass a .txt file in as the "example_str" - but I am getting print(match.group(1).strip()) AttributeError: 'NoneType' object has no attribute 'group' – SSerb1989 Oct 02 '21 at 12:20