I have a list of say, 10,000 strings (A). I also have a vector of words (V).
What I want to do is to modify each string of A to keep only those words in the string which are present in V and remove others.
For example, let's say first element of A is "one two three check test"
. And V is vector ["one", "test", "nine"]
. So, the modified version of first element of A should look like "one test"
. The whole process needs to be repeated for every string of A. For each comparison, V will remain same.
I am doing something like following (this could have some bugs, but I just want to give an idea about how I am approaching the problem).
for i in range(len(A)):
a = []
text = nltk.word_tokenize(A[i])
for i in range(len(text)):
if text[i] in V:
a.append(text[i])
a = " ".join(a)
A['modified_string'][i] = a
Above way is very slow and inefficient. How can I achieve it in a fast and efficient manner?