0

I have a list of strings text

text =  ["this","is","a","string","identifier","identifier","identifier","identifier","identifier","identifier","identifier","identifier","please","help"]

I want to get rid of some of the duplicates in this list, not all. So for example in the list above there are 8 instances of the word "identifier", I want to get rid of all but 2 of these instances while leaving the original instances in the same position; for example, the desired output would be

text = ["this","is","a","string","identifier","identifier","please","help"]

I have a list of strings of the duplicates I want to get rid of, stored in the variable delete_words

delete_words = ["identifier"]

To do this, I have:

from collections import Counter
temp_text = text # set up a static temporary variable so indexing issues don't arise
for word in temp_text:
    # keep count of words in _text_
    temp_word_counter_dict = dict(Counter([word for word in text]))
    if word in delete_words and temp_word_counter_dict[word] > 2:
        text.remove(word)

The problem with the code above is that

  1. The for loop doesn't iterate through every word, it only iterates through the first 10 words, so the resulting text variable still has 4 instances of the word "identifier" rather than 2

  2. For some reason I can't understand, the temporary variable temp_text is having it's words removed as well even though I specify text.remove(word), so I have no idea what is going on here.

Can anyone advise on how I can delete the duplicate words from the list while leaving 2 dupes in their original position? Thanks

PyRsquared
  • 6,970
  • 11
  • 50
  • 86

0 Answers0