0

I have a dictionary and a list as given below

correction =  {u'drug.ind': u'Necrosis', "date": "exp"}
drugs =  [[u'drug.aus', u'Necrosis'], [u'drug.nz', u'Necrosis'], [u'drug.uk', u'Necrosis'], [u'drug.ind', u'Necrosis'], [u'cheapest', u'drug.ind'], [u'date', u'']]

Now basically I look at the correction dictionary value and whenever it matches for every second element of the lists in drugs list, I remove them.

This is what I do

if correction and drugs:
    for i,x in correction.items():
        for j,k in enumerate(drugs):
            if len(i.split(".")) > 1:  # need to do the operation only for drugs which is always given in this format
                if x == k[1]:
                    drugs.pop(j)

Ideally the drugs list should now look like

drugs = [['cheapest', 'drug.ind'], ['date', '']]

But for some reason it looks like

[['drug.nz', 'Necrosis'], ['drug.ind', 'Necrosis'], ['cheapest', 'drug.ind'], ['date', '']]

I was hoping that everything that looks like Necrosis will be removed. But it removes it alternatively.

Why do I encounter this behaviour? What am I doing wrong?

Souvik Ray
  • 2,899
  • 5
  • 38
  • 70
  • 2
    You shouldn't change a list you are iterating over. If you do don't expect the indexes to match. – Klaus D. Feb 23 '19 at 21:24
  • dont modify the thing you're iterating over. you're modifying `drugs` while iterating through them. – Paritosh Singh Feb 23 '19 at 21:25
  • 1
    See this [answer](https://stackoverflow.com/questions/4081217/how-to-modify-list-entries-during-for-loop/4082739#4082739) of mine to the question titled [python: iterate a specific range in a list](https://stackoverflow.com/questions/5501725/python-iterate-a-specific-range-in-a-list). – martineau Feb 23 '19 at 22:03

4 Answers4

3

You are iterating over the list (drugs), and inside the loop, you are removing elements from the same list.

When a for loop is executed over an iterable object, Python keeps incrementing an internal "index" variable that helps Python to keep track of which is the current item we are at, in the list.

Within the loop, let's say you delete the item at index = 3. Now, the rest of the list (the items that you haven't yet iterated over) will shift by one place. The item that was previously present at index 4, will now be present at the index 3 vacated by the removed item. To process this shifted item in the next iteration, the internal "index" variable has to once again take the value of 3 for the next iteration also. But Python increments the index variable from 3 to 4 for the next iteration, as it normally would from one iteration to another. The result is that the item immediately following the removed item will not be examined / processed by the body of your for loop (since index would be 4 and not 3), and hence it will not get removed even if it meets the criteria for removal.

Several solutions

There are several methods suggested for doing "safe" deletes, at this thread.

I've picked my favorite one from those, and implemented it for your code, below:

correction =  {u'drug.ind': u'Necrosis', "date": "exp"}
drugs =  [[u'drug.aus', u'Necrosis'], [u'drug.nz', u'Necrosis'], [u'drug.uk', u'Necrosis'],
          [u'drug.ind', u'Necrosis'], [u'cheapest', u'drug.ind'], [u'date', u'']]

if correction and drugs:
    for i,x in correction.items():
        for j in range(len(drugs)-1, -1, -1):
            if len(i.split(".")) > 1:  # need to do the operation only for drugs which is always given in this format
                if x == drugs[j][1]:
                    drugs.pop(j)
print(drugs)

The output of this is:

[['cheapest', 'drug.ind'], ['date', '']]

The crucial aspect of this solution is in the line for j in range(len(drugs)-1, -1, -1). We are now iterating over the indices, instead of over the items at those indices. And we are iterating over the indices in reverse order (which effectively means that we are indirectly processing the list in reverse order).

fountainhead
  • 3,584
  • 1
  • 8
  • 17
2

Because when you pop an item from the array, it changes the index of the next item in the list to be 'behind' the iterator.

In the below example you see that we only ever actually run print() for every other item in the array, even though on the face of it we're iterating through the array deleting all members, we end up only deleting half

example = ['apple','banana','carrot','donut','edam','fromage','ghee','honey']

for index,food in enumerate(example):
    print(food);
    example.pop(index)

print(example) 

This is because what a for loop is (basically) doing is incrementing an integer i on each loop and getting example[i] as you pop elements from example it changes the position of the later elements so example[i] changes.

This code demonstrates this fact, as you see after we 'pop' an element, the next element changes in front of our eyes.

example = ['apple','banana','carrot','donut','edam','fromage','ghee','honey']


for i in range(0,len(example)-1):
    print("The value of example[",i,"] is: ",example[i+1])
    example.pop(i)
    print("after popping ,the value of example[",i,"] is: ",example[i+1])

print(example)
JeffUK
  • 4,107
  • 2
  • 20
  • 34
2

As mentioned by others, you shouldn't change a list or other iterable when you are iterating over it. If you want to remove certain elements, you should create a list of those items you want to remove, and remove them afterwards:

bad = []
for j, k in enumerate(drugs):
    if len(i.split(".")) > 1:
        if x == k[1]:
            bad.append(k)
for item in bad:
    drugs.remove(item)

As mentioned by fountainhead, this solution can fail if there are equal elements in drugs, where some of them get removed while others don't if the index itself is part of the condition. A more general solution might be this one:

import itertools

bad = []
for j, k in enumerate(drugs):
    if len(i.split(".")) > 1 and x == k[1]:
        bad.append(True)
    else:
        bad.append(False)
drugs = list(itertools.compress(drugs, bad))
Aemyl
  • 1,501
  • 1
  • 19
  • 34
  • 2
    An alternative solution is to iterate over the `reversed()` list so that removing an element does not affect the others' index; – Guimoute Feb 23 '19 at 22:15
  • @Guimoute that's a nice solution, never thought of that – Aemyl Feb 23 '19 at 22:44
  • @Aemyl yes now that I understand why it happens, this is what I will do next. – Souvik Ray Feb 23 '19 at 22:45
  • 1
    @Aemyl: Even if this solution works for OP, **as a general solution, what it does is not always what you want**. The reason is that the removal is being done with `remove()`, which removes an item based on its value being matched by `==` operator. Consider a hypothetical situation in which the decision to remove involves an additional criterion that is dependent on the index. Eg, in a hypothetical variation of OP's problem, decision to remove could depend upon an additional condition of index being odd or even. Your second `for` loop could then misbehave because it completely ignores index. – fountainhead Feb 24 '19 at 23:18
  • @fountainhead the only way i can think of where this causes a problem is if there are equal items in your list. Nice thought though, I will add a general solution. – Aemyl Feb 25 '19 at 06:35
  • @Aemyl: Yes, in my hypothetical example, I wanted to also mention that there are duplicates in the input list. But I could not mention it because I hit the character limit for the comment. Sorry, to make you figure that out on your own ! – fountainhead Feb 25 '19 at 06:39
1

You can create a set from the values of the correction dictionary (for a fast lookup) and use the function filter() to filter the list:

corr = set(correction.values())

list(filter(lambda x: x[1] not in corr, drugs))
# [['cheapest', 'drug.ind'], ['date', '']]
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73