#EDIT
==============================================================
a_list = re.sub("[^a-zA-Z ]","",s).split()#4957 words (lorum ipsum generated)
search_space = set("dog cat fish bear walrus".split())
def joranbeasley():
return search_space.intersection(a_list)
def stephenPochmann():
for needle in search_space:
if needle in s: print needle
import timeit
print "Stephen Timeit:",timeit.timeit(stephenPochmann,number=1000)
print "joran Timeit:",timeit.timeit(joranbeasley,number=1000)
results
Stephen Timeit: 0.126952238343
joran Timeit: 0.148540107751
===============================================================
set(["dog","cat","frog"]).intersection(my_str.split())
might give you what you need its hard to tell and should be plenty fast ...
if your string uses delimiters other than spaces you might need to pass an argument to split with your delimiter(";" or something)
you also might have to clean your input to remove stuff like punctuation
my_cleaned_string = re.sub("[^a-zA-Z]","",my_str)
compared to @StephenPochmans if I change it a bit (ie I dont need to keep splitting every time)
import re
a_list = re.sub("[^a-zA-Z ]","",s).split()#4957 words (lorum ipsum generated)
search_space = set("dog cat fish bear walrus".split())
def stephenPochmann():
for needle in search_space:
if needle in a_list: print needle
def joranbeasley():
return search_space.intersection(a_list)
import timeit
print "Stephen Timeit:",timeit.timeit(stephenPochmann,number=1000)
print "joran Timeit:",timeit.timeit(joranbeasley,number=1000)
and the results
c:\py_exp>python test_benchmark.py
Stephen Timeit: 0.356363602542
joran Timeit: 0.166205366392
after changeing @StephenPochmans to use the string instead of the list, he is right and it is indeed faster ... I will clarify this at the top of my answer soon
def stephenPochmann():
for needle in search_space:
if needle in s: print needle
here is the results
Stephen Timeit: 0.126952238343
joran Timeit: 0.148540107751