1

I am trying to remove duplicates from a list. I am trying to do that with below code.

>>> X
['a', 'b', 'c', 'd', 'e', 'f', 'a', 'b']
>>> for i in range(X_length) :
...  j=i+1
...  if X[i] == X[j] :
...   X.pop([j])

But I am getting

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IndexError: list index out of range

Please help.

Raja G
  • 5,973
  • 14
  • 49
  • 82
  • Is some of your code missing? What is `j`? In any case, I assume the issue is that you're shortening the last as you go. By the time `i` reaches it's maximum value, the list is no longer that long, so you have an index error. – user94559 Jul 24 '16 at 05:54
  • what is X_length and j? – kaitian521 Jul 24 '16 at 05:54
  • What's `X_length`? What's `j`? What's `X.pop([j])` supposed to be? – Aran-Fey Jul 24 '16 at 05:54
  • Updated my question. – Raja G Jul 24 '16 at 05:55
  • @Raja Try just printing out the value of `j` right before the `if` line. Hopefully the current issue with your code will become obvious from that. (You'll immediately run into another issue, but at least start there.) – user94559 Jul 24 '16 at 05:58
  • 2
    Don't modify an object as you are iterating over it. `range(X_length)` too long after the `pop`. – Mark Tolonen Jul 24 '16 at 05:59
  • What do you consider a duplicate? Consecutive or anywhere in the string? – pylang Jul 24 '16 at 06:15
  • you can do it with `set(x)`, it removes the duplicates. for more details refer http://stackoverflow.com/questions/7961363/removing-duplicates-in-lists – caldera.sac Jul 24 '16 at 06:20
  • 1
    This question is not a duplicate of the one marked by [TigerhawkT3](http://stackoverflow.com/users/2617068/tigerhawkt3). They both concern IndexExceptions, but there are different problems within this code that haven't been addressed in any answers and can't be addressed with the other thread. This is wasting an opportunity to teach beginners how python works. – dmlicht Jul 24 '16 at 06:59

4 Answers4

2

When you start to remove items from a list, it changes in size. So, the ith index may no longer exist after certain removals:

>>> x = ['a', 'b', 'c', 'd', 'e']
>>> x[4]
'e'
>>> x.pop()
'e'
>>> x[4]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

A simpler way to remove duplicate items is to convert your list to a set, which can only contain unique items. If you must have it as a list, you can convert it back to a list: list(set(X)). However, order is not preserved here.


If you want to remove consecutive duplicates, consider using a new array to store items that are not duplicates:
unique_x = []
for i in range(len(x) - 1):
    if x[i] != x[i+1]:
        unique_x.append(x[i])
unique_x.append(x[-1])

Note that our range bound is len(x) - 1 because otherwise, we would exceed the array bounds when using x[i+1].

Rushy Panchal
  • 16,979
  • 16
  • 61
  • 94
  • What id the input list is `['a', 'b', 'c', 'd', 'e', 'f', 'a', 'b', 'a', 'a']`? The output should be `['a', 'b', 'c', 'd', 'e', 'f', 'a', 'b', 'a']` right? – gaganso Jul 24 '16 at 06:16
  • @SilentMonk Yes, as long as you append the last value onto the new list. – Rushy Panchal Jul 24 '16 at 06:21
2

@Rushy's answer is great and probably what I would recommend.

That said, if you want to remove consecutive duplicates and you want to do it in-place (by modifying the list rather than creating a second one), one common technique is to work your way backwards through the list:

def remove_consecutive_duplicates(lst):
    for i in range(len(lst) - 1, 1, -1):
        if lst[i] == lst[i-1]:
            lst.pop(i)

x = ['a', 'b', 'b', 'c', 'd', 'd', 'd', 'e', 'f', 'f']
remove_consecutive_duplicates(x)
print(x) # ['a', 'b', 'c', 'd', 'e', 'f']

By starting at the end of the list and moving backwards, you avoid the problem of running off the end of the list because you've shortened it.

E.g. if you start with 'aabc' and move forwards, you'll use the indexes 0, 1, 2, and 3.

0
|
aabc

(Found a duplicate, so remove that element.)

 1
 |
abc

  2
  |
abc

   3
   |
abc  <-- Error! You ran off the end of the list.

Going backwards, you'll use the indexes 3, 2, 1, and 0:

   3
   |
aabc

  2
  |
aabc

 1
 |
aabc

(Found a duplicate so remove that element.)

0
|
abc <-- No problem here!
user94559
  • 59,196
  • 6
  • 103
  • 103
0

It is generally not advised to mutate a sequence while iterating it since the sequence will be constantly changing. Here are some other approaches:

Given:

X = ['a', 'b', 'c', 'd', 'e', 'f', 'a', 'b']

If you are only interested in removing duplicates from a list (and order does not matter), you can use a set:

list(set(X))
['a', 'c', 'b', 'e', 'd', 'f']

If you want to maintain order and remove duplicates anywhere in the list, you can iterate while making a new list:

X_new = []
for i in X:
    if i not in X_new:
        X_new.append(i)

X_new
# Out: ['a', 'b', 'c', 'd', 'e', 'f']

If you would like to remove consecutive duplicates, consider @smarx's answer.

pylang
  • 40,867
  • 14
  • 129
  • 121
0

In the last iteration of your list the value of j will be set to i + 1 which will be the length or 8 in this case. You then try to access X[j], but j is beyond the end of the list.

Instead, simply convert the list to a set:

>>> set(X)
{'e', 'f', 'd', 'c', 'a', 'b'}

unless you need to preserve order, in which case you'll need to look elsewhere for an ordered set.

Sam Whited
  • 6,880
  • 2
  • 31
  • 37