I have a list of lists composed by tokenized texts. The list lenght is around 1.200.000 texts. An example of this list is shown below:
texts = [
['hi', 'how', 'are', 'you'],
['i', 'am', 'fine', 'thank', 'you'],
...
I'm trying to remove for each list words that appear in another list. This is a list which is composed around 90.000 words and it is seemed to the next:
removing_words = ['ok', 'bye', 'hi', ...]
My code to do this is:
texts = [[token for token in text if token not in removing_words] for text in texts]
It works fine, but it is very, very slow. Any idea of how can I improve this? Thank you so much!