I need to parse a sentence like: "Alice is a boy." into ['Alice', 'boy'] and and "An elephant is a mammal." into ['elephant', 'mammal']. Meaning I need to split the string by 'is' while also remove 'a/an'. Is there an elegant way to do it?
Asked
Active
Viewed 94 times
0
-
2Can you post your attempt at the code? – Garrett Kadillak Apr 30 '17 at 22:02
-
What is the format of the sentence? – Peter Wood Apr 30 '17 at 22:04
-
Sounds like you need to [remove stopwords](http://stackoverflow.com/questions/5486337/how-to-remove-stop-words-using-nltk-or-python) and get the rest by simple splitting. – Wiktor Stribiżew Apr 30 '17 at 22:06
2 Answers
0
This answer does not make us of regex, but is one way of doing things:
s = 'Alice is a boy'
s = s.split() # each word becomes an entry in a list
s = [word for word in s if word != 'a' and word !='an' and word !='is']
The main downside to this is that you would need to list out every word you want to exclude in the list comprehension.

Deem
- 7,007
- 2
- 19
- 23
-
What about the example `An elephant is a mammal.`? Also, you forgot the full-stop. – Peter Wood May 01 '17 at 06:15
-
-
That's true, this method does not account for the full stop, oops. One could account for that using the `translate` method in the string package. – Deem May 01 '17 at 19:21
0
If you insists on using a regex, you can do it like this by using re.search
:
print(re.search('(\w+) is [a|an]? (\w+)',"Alice is a boy.").groups())
# output: ('Alice', 'boy')
print(re.search('(\w+) is [a|an]? (\w+)',"An elephant is a mammal.").groups())
# output: ('elephant', 'mammal')
# apply list() if you want it as a list

Taku
- 31,927
- 11
- 74
- 85