0

I have a text which looks like an email body as follows.

To: Abc Cohen <abc.cohen@email.com> Cc: <braggis.mathew@nomail.com>,<samanth.castillo@email.com> Hi 
Abc, I happened to see your report. I have not seen any abnormalities and thus I don't think we 
should proceed to Braggis. I am open to your thought as well. Regards, Abc On Tue 23 Jul 2017 07:22 

PM Tony Stark wrote:

Then I have a list of key words as follows.

no_wds = ["No","don't","Can't","Not"]
yes_wds = ["Proceed","Approve","May go ahead"]

Objective: I want to first search the text string as given above and if any of the key words as listed above is (or are) present then I want to extract the strings in between those key words. In this case, we have Not and don't keywords matched from no_wds. Also we have Proceed key word matched from yes_wds list. Thus the text I want to be extracted as list as follows

txt = ['seen any abnormalities and thus I don't think we should','think we should']

My approach:

I have tried

 re.findall(r'{}(.*){}'.format(re.escape('|'.join(no_wds)),re.escape('|'.join(yes_wds))),text,re.I)

Or

text_f = []
for i in no_wds:
  for j in yes_wds:
    t = re.findall(r'{}(.*){}'.format(re.escape(i),re.escape(j)),text, re.I)
    text_f.append(t)

Didn't get any suitable result. Then I tried str.find() method, there also no success.

I tried to get a clue from here.

Can anybody help in solving this? Any non-regex solution is somewhat I am keen to see, as regex at times are not a good fit. Having said the same, if any one can come up with regex based solution where I can iterate the lists it is welcome.

pythondumb
  • 1,187
  • 1
  • 15
  • 30

1 Answers1

0

Loop through the list containing the keys, use the iterator as a splitter (whatever.split(yourIterator)).

EDIT:

I am not doing your homework, but this should get you on your way:

I decided to loop through the splitted at every space list of the message, search for the key words and add the index of the hits into a list, then I used those indexes to slice the message, probably worth trying to slice the message without splitting it, but I am not going to do your homework. And you must find a way to automate the process when there are more indexes, tip: check if the size is even or you are going to have a bad time slicing. *Note that you should replace the \n characters and find a way to sort the key lists.

message = """To: Abc Cohen <abc.cohen@email.com> Cc: <braggis.mathew@nomail.com>,<samanth.castillo@email.com> Hi 
Abc, I happened to see your report. I have not seen any abnormalities and thus I don't think we 
should proceed to Braggis. I am open to your thought as well. Regards, Abc On Tue 23 Jul 2017 07:22"""

no_wds = ["No","don't","Can't","Not"]
yes_wds = ["Proceed","Approve","May go ahead"]

splittedMessage = message.split( ' ' )
msg = []
for i in range( 0, len( splittedMessage ) ):
   temp = splittedMessage[i]
   for j, k in zip( no_wds, yes_wds ):
       tempJ = j.lower()
       tempK = k.lower()
       if( tempJ == temp or tempK == temp ):
           msg.append( i )

found = ' '.join( splittedMessage[msg[0]:msg[1]] )
print( found )
Felipe Gutierrez
  • 675
  • 6
  • 17