I have a list of strings text
text = ["this","is","a","string","identifier","identifier","identifier","identifier","identifier","identifier","identifier","identifier","please","help"]
I want to get rid of some of the duplicates in this list, not all. So for example in the list above there are 8 instances of the word "identifier", I want to get rid of all but 2 of these instances while leaving the original instances in the same position; for example, the desired output would be
text = ["this","is","a","string","identifier","identifier","please","help"]
I have a list of strings of the duplicates I want to get rid of, stored in the variable delete_words
delete_words = ["identifier"]
To do this, I have:
from collections import Counter
temp_text = text # set up a static temporary variable so indexing issues don't arise
for word in temp_text:
# keep count of words in _text_
temp_word_counter_dict = dict(Counter([word for word in text]))
if word in delete_words and temp_word_counter_dict[word] > 2:
text.remove(word)
The problem with the code above is that
The for loop doesn't iterate through every word, it only iterates through the first 10 words, so the resulting
text
variable still has 4 instances of the word "identifier" rather than 2For some reason I can't understand, the temporary variable
temp_text
is having it's words removed as well even though I specifytext.remove(word)
, so I have no idea what is going on here.
Can anyone advise on how I can delete the duplicate words from the list while leaving 2 dupes in their original position? Thanks