Removing duplicate elements in a list

Question

This is a homework assignment that I have worked on for hours. Have made progress but am at the end of my rope! I have a text file that I have converted to a list of words (including some capitalized words) which I have sorted in alphabetical order. The last thing to do is remove duplicate words from the list. I have found answers to questions about removing items from lists, but not about removing duplicate items. I have set up a loop which -- for reasons I cannot understand - only works on half of the original list.

Here is the code I have tried:

fhand=open('romeo.txt')
data=fhand.read()
data=data.split()
data[0]='but'
data[8]='it'
data[13]='juliet'
data[17]='arise'
data[25]='who'
data.sort()
newlist=[]
for x in data:
    if data[0] == data[1]:
        del data[0]
    elif data[0] != data[1]:
        newlist.append(data[0])
    del data [0]
print(newlist)

Original split text file is: ['but', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks', 'it', 'is', 'the', 'east', 'and', 'juliet', 'is', 'the', 'sun', 'arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon', 'who', 'is', 'already', 'sick', 'and', 'pale', 'with', 'grief']

Expected output is: ['already', 'and', 'arise', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'is', 'it', 'juliet', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'who', 'window', 'with', 'yonder']

Actual output is: ['already', 'and', 'arise', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'is', 'it', 'juliet', 'kill', 'light']

So the loop does what it is supposed to do but quits after 'light'. Can't figure this out.

just do `set(original_split_text)` you will get the unique values — Onyambu, Sep 18 '19 at 21:46
Possible duplicate of [How to remove items from a list while iterating?](https://stackoverflow.com/questions/1207406/how-to-remove-items-from-a-list-while-iterating) and [How to remove item from a python list in a loop?](https://stackoverflow.com/questions/8312829/how-to-remove-item-from-a-python-list-in-a-loop). — John Kugelman, Sep 18 '19 at 21:46
By the way, I am sure there is a much more elegant way to approach this problem and no doubt this is what someone will show me. That's great. But I will also be interested to understand why the approach I have tried did not work. Thanks very much in advance! — Mark Schacter, Sep 18 '19 at 21:48
If you modify the list while you are iterating over it, you can miss elements as the iterator doesn't know what you've done. — Peter Wood, Sep 18 '19 at 21:50

score 2 · Answer 1 · edited Sep 19 '19 at 01:07

2

That's not a very good way to remove duplicates from a list. Also, you should't remove elements from a list while iterating over it like that. Consider using a set instead. Sets are not ordered, but since you're sorting the data before processing it you can use sorted to turn the unordered set into a sorted list.

data = ['but', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks', 'it', 'is', 'the', 'east', 'and', 'juliet', 'is', 'the', 'sun', 'arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon', 'who', 'is', 'already', 'sick', 'and', 'pale', 'with', 'grief']

new_data = sorted(set(data))

print(new_data)

Output:

['already', 'and', 'arise', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'is', 'it', 'juliet', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'who', 'window', 'with', 'yonder']

You can also do this in a loop, without sets, and without del:

newlist = []

for x in sorted(data):
    if x not in newlist:
        newlist.append(x)

edited Sep 19 '19 at 01:07

John Kugelman

349,597
67
533
578

answered Sep 18 '19 at 21:47

DjaouadNM

22,013
4
33
55

Thanks. I am at a point in learning python where I haven't yet been introduced to sets. So I am assuming there must have been some way to accomplish this assignment based on my current knowledge. Why would my loop work well but only on part of the list? Why did it stop? That's what I want to understand. – Mark Schacter Sep 18 '19 at 21:51
Thanks. This assignment was given before we covered the topic of "sets", so presumably there is a way to get it done without that. That's what's puzzling me. And I can't figure out why my loop works on a good chunk of the list and then stops. Just seems so strange! – Mark Schacter Sep 18 '19 at 21:57
@MarkSchacter see the accepted answer [in this linked duplicate](https://stackoverflow.com/questions/6260089/strange-result-when-removing-item-from-a-list) for a good explanation of why this isn't working. – juanpa.arrivillaga Sep 18 '19 at 21:58
@MarkSchacter It could work using a loop, but why do you have to delete from `data`? – DjaouadNM Sep 18 '19 at 22:00
@MrGeek - thanks. when I remove the 'del data_[0]' line my output is a string of 37 'already's (which is the first element in the original list. – Mark Schacter Sep 18 '19 at 22:08
@MarkSchacter Check my edit for a version with loops and no `set`s or `del`s. – DjaouadNM Sep 18 '19 at 22:11
@MrGeek yes, thanks. Nice and simple! – Mark Schacter Sep 19 '19 at 01:48

Removing duplicate elements in a list

1 Answers1