0

I would like to extract one word between two specific words. An example shows below. My exception is to extract the word between CALL and BACK. But I always got all words between the first CALL and the last BACK.

import regex

text = 'ask her to call Mary back when she comes back'

p = r'(?i)(?s)call(.*)back'

for match in regex.finditer(p, str(text)):
    print (match.group(1))

Expected output:

Mary

Actual output:

Mary back when she comes

Update: Thanks for the solutions. I just realized I did not describe my problem clearly. I would like to de-identify someone's name or some organization in an article. This article has some sentences like 'ask her to call the office when she comes back', 'she was told to call Mary back', 'she will call NIH back when she receives the noice'.

So my purpose is to extract "Mary", "NIH" in the above sentences. It means only one word should be extracted between "call" and "back".

p = r'(?i)(?s)call(.*?)back' extracts all words between "call" and "back".

So my questions, how to extract only one word between "call" and "back"?

1 Answers1

-1

Here is how to approach that:

import re

text = 'ask her to call Mary back when she comes back'

for match in re.findall('(?<=call ).*?(?= back)', text):
    print(match)

Output:

Mary

Breaking it down:

This is a positive look behind: (?<=pattern1)
This is a positive look ahead: (?=pattern2)
This means to extract all the chunks of text that are between pattern1 and pattern2: .*? (non-greedy)



UPDATE:

The re.findall('(?<=call ).*?(?= back)', text) can also be changed to re.findall('call (.*?) back', text).

Red
  • 26,798
  • 7
  • 36
  • 58