Only one single item doesn't remove while trying to remove repeated items in Python. Why?

Question

copyofnumbers = [1, 2, 3, 3, 1, 1, 4, 4, 5, 6, 7, 6, 7, 1]
copyofnumbers.sort()

for item in copyofnumbers:
    if (copyofnumbers.count(item) > 1):
        copyofnumbers.remove(item)

print(copyofnumbers)

I am trying to remove same items from the list. The above code removes all the repeated items but doesn't remove "1". What am I doing wrong ?

[1, 1, 2, 3, 4, 5, 6, 7]

I expect the output should remove all the repeated items.

I expect the code to remove all the repeated items.

Do not *iterate* over a list you manipulate. By removing, you implicitly move the list one to the left over the cursor. — Willem Van Onsem, Jul 21 '19 at 13:51
Tangentially related, you could identify unique values using pandas: `pd.unique(copyofnumbers)` — Yaakov Bressler, Jul 21 '19 at 15:26

score 5 · Answer 1 · answered Jul 21 '19 at 14:00

You iterate over a list you manipulate. That means that each iteration the "cursor" progresses, but if you remove an element, then the list will thus be reduced, and as a result you make a hop of two.

Indeed, imagine the following situation:

1 2 2 2 2 4 5
  ^

Here the caret denotes the cursor of the iterator. You thus check if 4 occurs multiple times. You thus remove 4, and Python will evidently remove the first one. Then you start the next iteration advancing the caret. So as a result the next iteration the situation looks like this:

1 2 2 2 4 5
    ^

So you "skipped" over the 2. Maybe this does not look like a problem (yet). Since we still can remove 2. But if we later remove the next 2, the situation looks like:

1 2 2 4 5
      ^

So now we can no longer hope to remove 2s.

That being said, using .count(..) and .remove(..) are usually not a good idea anyway. A .count(..) takes linear time to count the elements, and a .remove(..) takes, if you remove from the left, worst case linear time as well, making this a quadratic algorithm. Even if this thus worked, it would not be very efficient.

If the elements are hashable, and integers are hashable, we can simply convert these to a set (and back to a list, for example with sorted), like:

sorted(set(copyofnumbers))

This gives us:

>>> sorted(set(copyofnumbers))
[1, 2, 3, 4, 5, 6, 7]

Only one single item doesn't remove while trying to remove repeated items in Python. Why?

1 Answers1