2

Possible Duplicate:
Check if multiple strings exist in another string

Say I have a list of allowed words/phrases:

'Stack'
'Overflow'
'Stack Overflow'
'Stack Exchange'
'Exchange'

and the following text to parse:

'Hello, and welcome to Stack Overflow. 
 Here are some words which should match: Stack, Exchange.'

I'd like to get the list of words which are found in the allowed list:

  • 'Stack Overflow'
  • 'Stack'
  • 'Exchange'

What would be the best way to achieve the result?

The allowed list I'll be using could be at least a thousand words/phrases.

Community
  • 1
  • 1
john2x
  • 22,546
  • 16
  • 57
  • 95
  • 1
    Seems like this answer from another question has some interesting pointers: http://stackoverflow.com/a/3261300/89391 – miku Dec 30 '12 at 00:57
  • 1
    Do you know how what the largest number of words in a phrase will be? – Sam Mussmann Dec 30 '12 at 02:30
  • Thanks everyone. I'll try out the possible solutions, though re-implementing grep in Python sounds a bit daunting. @SamMussmann if by phrase you mean the words/phrases in the list, then they'll be at max 4 words, with majority at 1-2 words. – john2x Dec 30 '12 at 04:39

3 Answers3

2

Put the words in a list and after use

def intersect(x, y):
    return list(set(x) & set(y))
word_list_text=string.split(text)
words_found={}
words_found=intersect(word_list_text, words)
greeness
  • 15,956
  • 5
  • 50
  • 80
Mihai8
  • 3,113
  • 1
  • 21
  • 31
0

let words be the list of words you want to search and paragraph(the give paragraph of info) where you want to search the words for

for i in words:
    if i in paragraph:
        print i

this code is good for the paragraph and words in your question,but mind you this code will print stack if we have stackoverflow and no stack(individually) , this is both a advantage and a disadvantage ,depends on your purpose, if you want it for individual words use this

y=paragraph.split()
for i in words:
    if i in y:
        print i
Pradyumna
  • 167
  • 2
  • 12
  • Shouldn't you use `in` instead of `__contains__`? `in` will call `__contains__` under the hood, and is more idiomatic. – Sam Mussmann Dec 30 '12 at 02:29
0

If you have the phrases as:

phrases = ['Stack','Overflow','Stack Overflow','Stack Exchange','Exchange']

then the text as:

text = """Hello, and welcome to Stack Overflow. 
Here are some words which should match:Stack, Exchange."""

The following then can be done:

found_words = [word for word in phrases if word in text]

This will then eliminate the other phrases if they are not in the text. Though this will also Overflow in addition to the ones specified.

Hairr
  • 1,088
  • 2
  • 11
  • 19