0

I need to parse a sentence like: "Alice is a boy." into ['Alice', 'boy'] and and "An elephant is a mammal." into ['elephant', 'mammal']. Meaning I need to split the string by 'is' while also remove 'a/an'. Is there an elegant way to do it?

Taku
  • 31,927
  • 11
  • 74
  • 85
assiegee
  • 351
  • 5
  • 18

2 Answers2

0

This answer does not make us of regex, but is one way of doing things:

s = 'Alice is a boy'
s = s.split() # each word becomes an entry in a list
s = [word for word in s if word != 'a' and word !='an' and word !='is']

The main downside to this is that you would need to list out every word you want to exclude in the list comprehension.

Deem
  • 7,007
  • 2
  • 19
  • 23
0

If you insists on using a regex, you can do it like this by using re.search:

print(re.search('(\w+) is [a|an]? (\w+)',"Alice is a boy.").groups())
# output: ('Alice', 'boy')

print(re.search('(\w+) is [a|an]? (\w+)',"An elephant is a mammal.").groups())
# output: ('elephant', 'mammal')
# apply list() if you want it as a list
Taku
  • 31,927
  • 11
  • 74
  • 85