Delete duplicates in list. Why my codes go wrong ?

Question

I'm trying to write my codes to delete duplicates in a list. Here is first one:

a = [2,2,1,1,1,2,3,6,6]
b = [2,2,1,1,1,2,3,6,6]
for x in a:
  a.remove(x)
  if x in a:
     b.remove(x)
print('the list without duplicates is: ', b)

Unfortunately, it produces this result:

the list without duplicates is: [2, 1, 2, 3, 6]

Then I try to write the second one:

a = [2,2,1,1,1,2,3,6,6]
b = [2,2,1,1,1,2,3,6,6]
for i in range(len(a)): 
  for x in a:
    a.remove(x)
    if x in a:
        b.remove(x)    
print('the list without duplicates is: ', b)

And this second one produces the result as I expected:

the list without duplicates is:  [1, 2, 3, 6]

I really dont see why the second one is different from the first one. In fact, if I apply for the list:

[2,2,1,1,2,3,6,6]

Both of them produces the same result:

the list without duplicates is:  [1, 2, 3, 6]

yeah, I want to try on my own before rely on the built-in. I think it's good for newbie. — Axlp1210, Jan 12 '18 at 19:21
Yea you could just google it, but generally, removing elements from a list while you are iterating over it is bad practice. Though sometimes I do this anyway... But your question is just why the second one worked? — SuperStew, Jan 12 '18 at 19:23
yeah, for me both of them are just the same right ? I cannot see any difference. In fact, when I execute my first one by pen and paper I do exactly like the second one. — Axlp1210, Jan 12 '18 at 19:28

wwii · Accepted Answer · 2018-01-12T19:57:28.880

why the second one is different

The for loop keeps track of the index it is on. When you remove an item from the list, it messes up the count: the item that was at index i+1 has now become the item at index i and gets skipped in the next iteration.

To illustrate:

a_list = ['a','b','c','d','e','f','g','h']
for i, item in enumerate(a_list):
    print(f"i:{i}, item:{item}, a_list[i+1]:{a_list[i+1]}")
    print(f"\tremoving {item}", end = ' ---> ')
    a_list.remove(item)
    print(f"a_list[i+1]:{a_list[i+1]}")
>>>
i:0, item:a, a_list[i+1]:b
    removing a ---> a_list[i+1]:c
i:1, item:c, a_list[i+1]:d
    removing c ---> a_list[i+1]:e
i:2, item:e, a_list[i+1]:f
    removing e ---> a_list[i+1]:g
i:3, item:g, a_list[i+1]:h
....

Your second solution works because even though you are skipping items in the for loop, you have added an outer loop that revisits the process and operates on those skipped items. In the last iteration, because you are skipping items, there is always one of the duplicate values left.

score 0 · Answer 2 · answered Jan 12 '18 at 19:28

0

If you really don't want to use the built-in set, it is way easier to create a new list and add new elements to it if they aren't duplicate.

Something like:

a = [2,2,1,1,1,2,3,6,6]
c = []
for item in a:
    if item not in c:
        c.append(item)

answered Jan 12 '18 at 19:28

erickrf

2,069
5
21
44

Delete duplicates in list. Why my codes go wrong ?

2 Answers2