Sets are probably your best bet for speed when using the in
operator.
For building a set containing only words, we need to:
1) remove the punctuation from the string;
2) split the string in whitespaces.
For removing punctuation, this answer probably has the fastest solution (using str.makestrans
and string.punctuation
).
Here's an example using your test case:
import string
test_string = "Hello! This is a test. I love to eat apples."
test_string_no_punctuation = test_string.translate(str.maketrans('', '', string.punctuation))
word_set = set(test_string_no_punctuation.split())
fruits = ['apples', 'oranges', 'bananas']
for fruit in fruits:
if fruit in word_set:
print(fruit+" contains in the string")
You might want to wrap the verbose operations of removing punctuations + splitting the string into a function:
def word_set(input_string):
return set(input_string.translate(str.maketrans('', '', string.punctuation)).split())