1

I am a beginner in Python. I have learned other languages before, such as C++ (beginner) and JQuery. But I find the looping in python is quite confusing.

Well I want to achieve a simple result. The program will loop through a list of words, and then it will remove the words that match with the first two letters with the next word in the list:

test = ['aac', 'aad', 'aac', 'asd', 'msc']
for i in range(len(test)):
    if test[i][0:2] == test[i+1][0:2]:
        test.remove(test[i])

# This should output only ['aac', 'asd', 'msc']
print test

The code above should remove 'aac' and 'aad' from the list. But in reality, this raises an IndexError. Furthermore, I wasn't able to achieve the desired result. Can you please explain?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Ahmed Sadman
  • 105
  • 10

4 Answers4

3

You are changing the length of the list while looping over a range that goes up to the starting length of the list; remove one item from the list and the last index is no longer valid.

Moveover, because items are removed from the list at the current index, the rest of the list indices shift; what was at index i + 1 is now at index i and your loop index is no longer useful.

Last but not least, you are looping until the very last index of test, but then try to access test[i + 1] still; that index does not exist even if you were not removing elements from the list.

You could use a while loop to achieve what you want to do:

test = ['aac', 'aad', 'aac', 'asd', 'msc']
i = 0
while i < len(test) - 1:
    if test[i][:2] == test[i+1][:2]:
        del test[i]
        continue
    i += 1

Now i is tested against the new length each loop iteration, and we only increment i if no element was removed. Note that the loop is limited to the length minus 1 because you want to test for test[i + 1] each iteration.

Note that I use del test[i]; no need to scan through the list searching for that the value-to-remove again; this could lead to subtle bugs as well if values appear multiple times in the list but only later instances should be removed; e.g. ['aac', 'foo', 'aac', 'aad'] should result in ['aac', 'foo', 'aad'], not ['foo', 'aac', 'aad'], which is what test.remove(test[i]) would result in.

Demo:

>>> test = ['aac', 'aad', 'aac', 'asd', 'msc']
>>> i = 0
>>> while i < len(test) - 1:
...     if test[i][:2] == test[i+1][:2]:
...         del test[i]
...         continue
...     i += 1
... 
>>> test
['aac', 'asd', 'msc']

You could use a list comprehension to avoid the shrinking list problem:

>>> [t for i, t in enumerate(test) if i == len(test) - 1 or t[:2] != test[i + 1][:2]]
['aac', 'asd', 'msc']

Both approaches require only one loop through the input list.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I understand now. At first I thought that len(range(test)) will get updated as the list items get removed. But later I understood that my thought was a foolish one! I would use the while method because it looks best for me. Thanks. One question: you used the "continue" method, but is that really necessary? – Ahmed Sadman Oct 11 '13 at 16:26
  • If you don't use `continue` then you'd have to use `else:`; you don't want `i += 1` to run when you just removed `test[i]`. – Martijn Pieters Oct 11 '13 at 16:30
  • The method suggested by @Manoj should've worked partially. That method is able to handle the "i+1 not existing" error. But the result is totally unexpected. That code only removes the first item of the list and outputs ['aad', 'aac'. 'asd', 'msc'] – Ahmed Sadman Oct 11 '13 at 16:40
2

As you removing items from the list, range(len(test)) still holds the same value. So even if your test list has only no items left, the loop is still going.

I have two solutions:

  1. Copy the items you want to a new list, so instead of deleting it:

    test2 = test[i]
    

    And don't forget to reverse the conditions.

  2. Loop it backwards. Like this:

    n = len(test)
    for i in range(n):
        j = n - i - 1
        if j > 1:
        if test[j][0:2] == test[j-1][0:2]:
            test.remove(test[j])
    

    Or, as martijn suggested:

    n = len(test)
    for i in range(n-1, 0, -1):
        if i > 1:
        if test[i][0:2] == test[i-1][0:2]:
            test.remove(test[i])
    

Hope it helps!

P.S sorry for my stupid, previous answer

aIKid
  • 26,968
  • 4
  • 39
  • 65
  • Well, he's not technically iterating over a list while removing items from it. He's iterating over `range(len(test))` and deleting items from `test`, not iterating over `test` while deleting from it. The problem is that he needs to pop an element off of `range(len(test))` every time he kills something in `test` – inspectorG4dget Oct 11 '13 at 07:02
  • Also, you're still removing from `test`, which will cause the same error all over again – inspectorG4dget Oct 11 '13 at 07:03
  • `test`and `test2` start out with equal size. But as you delete things in `test2`, its size shrinks. Which means that `test[i]` and `test2[i]` won't refer to the same object anymore. Hence, you might still run into an index error here. Further `test2=test` makes both variables refer to the same list, not two seperate copies of `test`. So `test2.remove(…)` is equivalent to `test.remove(…)` in this case. I strongly recommend testing your code before posting it – inspectorG4dget Oct 11 '13 at 07:08
  • Nah, really fixed it now. I didn't think at all before. Sorry sir! – aIKid Oct 11 '13 at 07:30
  • Instead of inverting `i`, why not use `range()` to loop backwards? `range(len(test) - 1, 0, -1)`; this loops from `len(test) - 1` to `1`, downwards. – Martijn Pieters Oct 11 '13 at 09:55
  • Whoa, didn't thought of it before, thanks! Can i include it in my solution? – aIKid Oct 11 '13 at 09:56
1

As others have said as you remove items the list gets shorter causing an index error.

Keeping in line with the original question. if your looking to remove items using list.remove() you can add the found items to a list then iterate over them and remove them from your original list like so:

# Set up the variables
test = ['aac', 'aad', 'aac', 'asd', 'msc']
found = []
# Loop Over the range of the lenght of the set
for i in range(len(test)):
    try:
        if test[i].startswith(test[i+1][0:2]):
            found.append(test[i])  # Add the found item to the found list
    except IndexError: # You'll hit this when you do test[i+1]
        pass

# Remove the Items at this point so you don't cause any issues
for item in found:
    test.remove(item)  # If an item has been found remove the first instance

# This sholuld output only ['aac', 'asd', 'msc']
print test

EDIT:

As per Martins comment, you don't need to make a second list of items that need to be removed you can instead make a list of items that didn't need to be removed like so:

# Set up the variables
test = ['aac', 'aad', 'aac', 'asd', 'msc']
found = []

# Loop Over the range of the lenght of the set
for i in range(len(test)):
    try:
        if not test[i].startswith(test[i+1][0:2]):
            found.append(test[i])  # Add the found item to the found list
    except IndexError: # You'll hit this when you do test[i+1]
        found.append(test[i]) # If there is no test[i+1], test[i] must be cool.


# This sholuld output only ['aac', 'asd', 'msc']
print found
Noelkd
  • 7,686
  • 2
  • 29
  • 43
0

for i in range(len(test)) gives you a list with the valid indices of test. However, as you keep deleting items from test in the loop, the size of test reduces, causing some of those originally valid indices to become invalid.

What you're doing is something like this:

L = range(len(test))
for i in L:
  if condition:
    # remove something from test <- the size of test has changed.
                                 # L[-1] is no longer a valid index in test

What you could do instead, is to accumulate the indices of things that you would like to delete and delete them later:

deleteThese = set()
for i,item in enumerate(test[:-1]):
  if item[0:2] == test[i+1][0:2]:
    deleteThese.add(i)
test = [item for i,item in enumerate(test) if i not in deleteThese]

Output:

In [70]: test = ['aac', 'aad', 'aac', 'asd', 'msc']

In [71]: %paste
deleteThese = set()
for i,item in enumerate(test[:-1]):
  if item[0:2] == test[i+1][0:2]:
    deleteThese.add(i)
test = [item for i,item in enumerate(test) if i not in deleteThese]

## -- End pasted text --

In [72]: test
Out[72]: ['aac', 'asd', 'msc']
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241