I am a programming newbie and I was playing with some big data (yelp dataset 6m+ reviews) and for a simple example I wanted to try to find a certain word in a text so basically i tried to for loop through all this data and find a certain word in these reviews and before getting triggered I know that this is the worst way of doing that but then I used nltk to preprocess the data, put the reviews in a list and check for the word inside this list using the "in" keyword and it was much faster so my question is what makes the "in" keyword faster ? And is there an even faster way other than improving the preprocessing part ?
Edit 1(here is an example code):
First I tokenize the review e.x "This place is good" becomes ["This","place","is","good"]
contents = word_tokenize(data[u'text'])
Then I check if a certain string is in this list
if(contents[i] in list_of_targeted_words): return 1
This appeared to be faster than using for loop
if(contents[i] == list_of_targeted_words[j]): return 1