Given a list of tokens, I want to replace all the tokens in tokenized text with whitespace.
For example, given ['a', 'is']
and 'this is a test'
, the result should be 'this test'
.
I tried the code from How can I do multiple substitutions using regex in python?, but the output is 'th test'
.
Besides, the list is long (about 1k tokens) and the text file is large. so the speed is also important.