0

I have a list of stock tickers which I've scraped from the web and I'm trying to remove 'n/a' values from.

Here's a snippet of what the list looks like before trying to remove the values:

 ticker_list = ['BANR',
 'AUB',
 'HOPE',
 'INDB',
 'CVBF',
 'FFBC',
 'FRME',
 'TRMK',
 'n/a',
 'n/a']

So here is what I tried to run to remove those values:

for x in ticker_list:
    if x == 'n/a':
        ticker_list.remove(x)

This code partly works. It removes one of the n/a values, resulting in this:

['BANR',
 'AUB',
 'HOPE',
 'INDB',
 'CVBF',
 'FFBC',
 'FRME',
 'TRMK',
 'n/a']

I've also tried the following:

for x in ticker_list:
    if x.strip() == 'n/a':
        ticker_list.remove(x)

Also this:

for x in ticker_list:
    if 'n/a' in x.strip():
        ticker_list.remove(x)

In all cases, I get the same result. It removes just one of the n/a values, but one remains.

Is this some sort of encoding thing, or am I doing something dumb?

Thanks a lot for any responses!

1 Answers1

4

The problem is: you are removing elements while iterating, and this is a "undefined behaviour".

You can achieve the same with a list compreension:

ticker_list = [value for value in ticker_list if value != "n/a"]

That's because it doesn't have a consistent operation with sequences since its backing iterator may try to move onto the next value and it may not exist anymore. For example:

  • Removing from a list will ignore the last element:
def remove(values):
    for value in values:
        if value == 1:
            values.remove(value)

print(remove([1, 1, 2, 3, 4, 5, 1])) # [2, 3, 4, 5, 1]
  • Removing from a set will raise RuntimeError: Set changed size during iteration.
def remove(values):
    for value in values:
        if value == 1:
            values.remove(value)

print(remove({1, 1, 2, 3, 4, 5})) # Won't work
  • Removing from a dict will raise RuntimeError: dictionary changed size during iteration.
def remove(values):
    for key, value in values.items():
        if key == 1:
            del values[k]

print(remove({1: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f'})) # Won't work
  • Removing from a collections.deque will raise RuntimeError: deque mutated during iteration.
def remove(values):
    for value in values:
        if value == 1:
            values.remove(value)

print(remove(collections.deque([1, 1, 2, 3, 4, 5]))) # Won't work

From all of these, just list fails silently. Python docs also notes that changing a list while iterating over it is unsafe:

It is sometimes tempting to change a list while you are looping over it; however, it is often simpler and safer to create a new list instead.

enzo
  • 9,861
  • 3
  • 15
  • 38
  • That solved it. Thanks a lot. Do you mind expanding on why removing elements while iterating is "undefined behavior?" – dogsplayingpoker May 18 '21 at 02:25
  • Of course I don't, I've added an explanation, see if this helps – enzo May 18 '21 at 02:59
  • Oh, I think I see your point. So because I removed element N1, it tries to go on to N2, but now there's fewer elements in the list so it assumes the job is done, kind of, which is why it removes the first n/a value, but not the last? You mentioned that changing a list while looping over it is potentially dangerous. Does this apply to appending a list? Like iterating over a dataset and appending the extracted value to an empty list? – dogsplayingpoker May 18 '21 at 06:30
  • You're indeed correct in your first assumption. About your second one, since you're iterating over a dataset and not the list you're appending, that's ok. But if you're iterating over a list and appending to it, that's bad practice and it can lead to subtle bugs (e.g. if you create a list with two elements, iterate over it and appending to it once, you may expect the loop will repeat three times when it may repeat only two). – enzo May 18 '21 at 12:11