1

I am working with NLTK and I would like to find all sentences that include a given set of key words. For example, it is currently [x for x in tokenized_sent if 'key_word1' and 'key_word2' and 'key_word3' in x]. I would like to set it up so that a user can input any number of words that can then be set equal to these key words separated by and.

I have tried something like inserting user_input_list = ['key_word1','key_word2'] by writing [x for x in tokenized_sent if user_input_list[0] and user_input_list[1] in x] which works but there has got to be a better way, especially a way to handle any given number of words to look for. Thanks.

Phife
  • 31
  • 3
  • 2
    Does this answer your question? [Why does \`a == b or c or d\` always evaluate to True?](https://stackoverflow.com/questions/20002503/why-does-a-b-or-c-or-d-always-evaluate-to-true) – quamrana Dec 16 '20 at 14:53

3 Answers3

2

You can utilize set subsets. Make the user input list a set and see if it is a subset of your key words.

[x for x in tokenized_sent if set(user_input_list).issubset(x)]
busybear
  • 10,194
  • 1
  • 25
  • 42
1

I think that you could use the all-Keyword:

[words for words in tokenized_sent if all([keyword in words for keyword in keywords])]

Spend attention: The first in results in-Statement a boolean while the second one is used to get elements from the list.

Martin Dallinger
  • 369
  • 1
  • 12
1

Use filter and all methods:

list(filter(lambda x: all(key in x for key in user_input_list), tokenized_sent))
JRazor
  • 2,707
  • 18
  • 27