Let's say I have a dataframe with a column of sentences:
data['sentence']
0 i like to move it move it
1 i like to move ir move it
2 you like to move it
3 i liketo move it move it
4 i like to moveit move it
5 ye like to move it
And I want to check which sentences have words outside of a dictionary, like
data['sentence'] OOV
0 i like to move it move it False
1 i like to move ir move it False
2 you like to move it False
3 i liketo move it move it True
4 i like to moveit move it True
5 ye like to move it True
Right now I have to iterate over every row doing:
data['OOV'] = False # out of vocabulary
for i, row in data.iterrows():
words = set(data['sentence'].split())
for word in words:
if word not in dictionary:
data.at[i,'OOV'] = True
break
Is there a way to vectorize (or speed up) this task?