I am processing data from a JSON file for machine learning. The data are sentences. The sentences are read into an array and tokenized using NLTK perfectly. So in each sentence array, I am left with something like this ['set', 'a', 'timer', 'for', '*int', '*unit_of_time']
, which is totally correct. I would like to remove all elements that contain a ''. This works correctly 90% of the time, but I find that if there are two elements containing an '' in succession, the second element is left behind. So if I run:
words = ['set', 'a', 'timer', 'for', '*int', '*unit_of_time']
words = nltk.word_tokenize(pattern)
for word in words:
if '*' in word:
words.remove(word)
I am left with words = ['set', 'a', 'timer', 'for', '*unit_of_time']
, but should be left with `words = ['set', 'a', 'timer', 'for'] The loop successfully removes '*int', but not '*unit_of_time'.
Am I doing this incorrectly? I am using Python 3.7 on Ubuntu 19.10.
If I can provide any additional information, please let me know.