6

How to extract a substring after keyword am, is or are from a string but not include am, is or are?

string = 'I am John'

I used:

re.findall('(?<=(am|is|are)).*', string)

An error occurs

re.error: look-behind requires fixed-width pattern

What is the correct approach?

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Chan
  • 3,605
  • 9
  • 29
  • 60

3 Answers3

9
import re

s = 'I am John'

g = re.findall(r'(?:am|is|are)\s+(.*)', s)
print(g)

Prints:

['John']
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
3

In cases like this I like to use finditer because the match objects it returns are easier to manipulate than the strings returned by findall. You can continue to match am/is/are, but also match the rest of the string with a second subgroup, and then extract only that group from the results.

>>> import re
>>> string = 'I am John'
>>> [m.group(2) for m in re.finditer("(am|is|are)(.*)", string)]
[' John']

Based on the structure of your pattern, I'm guessing you only want at most one match out of the string. Consider using re.search instead of either findall or finditer.

>>> re.search("(am|is|are)(.*)", string).group(2)
' John'

If you're thinking "actually I want to match every instance of a word following am/is/are, not just the first one", that's a problem, because your .* component will match the entire rest of the string after the first am/is/are. E.g. for the string "I am John and he is Steve", it will match ' John and he is Steve'. If you want John and Steve separately, perhaps you could limit the character class that you want to match. \w seems sensible:

>>> string = "I am John and he is Steve"
>>> [m.group(2) for m in re.finditer(r"(am|is|are) (\w*)", string)]
['John', 'Steve']
Kevin
  • 74,910
  • 12
  • 133
  • 166
0

One of the solution is using partition function. there is an example

string = 'I am John'
words = ['am','is','are']

for word in words :
    before,word,after = string.partition(word)
    print (after)

OUTPUT :

 John
Omer Tekbiyik
  • 4,255
  • 1
  • 15
  • 27