0

I have the same problem that was discussed in this link Python extract sentence containing word, but the difference is that I want to find 2 words in the same sentence. I need to extract sentences from a corpus, which contains 2 specific words. Does anyone could help me, please?

Community
  • 1
  • 1
Marcelo
  • 438
  • 5
  • 16

3 Answers3

2

If this is what you mean:

import re
txt="I like to eat apple. Me too. Let's go buy some apples."
define_words = 'some apple'
print re.findall(r"([^.]*?%s[^.]*\.)" % define_words,txt)  

Output: [" Let's go buy some apples."]

You can also try with:

define_words = raw_input("Enter string: ")

Check if the sentence contain the defined words:

import re
txt="I like to eat apple. Me too. Let's go buy some apples."
words = 'go apples'.split(' ')

sentences = re.findall(r"([^.]*\.)" ,txt)  
for sentence in sentences:
    if all(word in sentence for word in words):
        print sentence
badc0re
  • 3,333
  • 6
  • 30
  • 46
  • Thanks badc0re, but I forgot to mention that those 2 words don't need to be consecutive. Is it a way to use regex and get the same result as in the answer below by moliware? – Marcelo Aug 30 '13 at 11:11
  • I have added another solution similar to @moliware with using regex. – badc0re Aug 30 '13 at 11:25
2

This would be simple using the TextBlob package together with Python's builtin sets.

Basically, iterate through the sentences of your text, and check if their exists an intersection between the set of words in the sentence and your search words.

from text.blob import TextBlob

search_words = set(["buy", "apples"])
blob = TextBlob("I like to eat apple. Me too. Let's go buy some apples.")
matches = []
for sentence in blob.sentences:
    words = set(sentence.words)
    if search_words & words:  # intersection
        matches.append(str(sentence))
print(matches)
# ["Let's go buy some apples."]

Update: Or, more Pythonically,

from text.blob import TextBlob

search_words = set(["buy", "apples"])
blob = TextBlob("I like to eat apple. Me too. Let's go buy some apples.")
matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
print(matches)
# ["Let's go buy some apples."]
Steve L
  • 1,704
  • 1
  • 20
  • 29
1

I think you want an answer using nltk. And I guess that those 2 words don't need to be consecutive right?

>>> from nltk.tokenize import sent_tokenize, word_tokenize
>>> text = 'I like to eat apple. Me too. Let's go buy some apples.'
>>> words = ['like', 'apple']
>>> sentences = sent_tokenize(text)
>>> for sentence in sentences:
...   if (all(map(lambda word: word in sentence, words))):
...      print sentence
...
I like to eat apple.
moliware
  • 10,160
  • 3
  • 37
  • 47