Why is re not removing some values from my list?

Question

I'm asking more out of curiosity at this point since I found a work-around, but it's still bothering me.

I have a list of dataframes (x) that all have the same column names. I'm trying to use pandas and re to make a list of the subset of column names that have the format

"D(number) S(number)"

so I wrote the following function:

def extract_sensor_columns(x): 
    sensor_name = list(x[0].columns)
    for j in sensor_name:
        if bool(re.match('D(\d+)S(\d+)', j))==False:
            sensor_name.remove(j)

    return sensor_name

The list that I'm generating has 103 items (98 wanted items, 5 items). This function removes three of the five columns that I want to get rid of, but keeps the columns labeled 'Pos' and 'RH.' I generated the sensor_name list outside of the function and tested the truth value of the

bool(re.match('D(\d+)S(\d+)', sensor_name[j]))

for all five of the items that I wanted to get rid of and they all gave the False value. The other thing I tried is changing the conditional to ==True, which even more strangely gave me 54 items (all of the unwanted column names and half of the wanted column names).

If I rewrite the function to add the column names that have a given format (rather than remove column names that don't follow the format), I get the list I want.

def extract_sensor_columns(x): 
    sensor_name = []
    for j in list(x[0].columns):
        if bool(re.match('D(\d+)S(\d+)', j))==True:
            sensor_name.append(j)

    return sensor_name

Why is the first block of code acting so strangely?

It looks like you are iterating over a list and deleting elements of the same list at the same time, this can give strange results. Some alternative ways: https://stackoverflow.com/questions/1207406/how-to-remove-items-from-a-list-while-iterating — Shaido, Jan 13 '23 at 08:39

score 1 · Accepted Answer · answered Jan 13 '23 at 08:40

1

In general, do not change arrays while iterating over them. The problem lies in the fact that you remove elements of the iterable in the first (wrong) case. But in the second (correct) case, you add correct elements to an empty list.

Consider this:

arr = list(range(10))
for el in arr:
    print(el)

for i, el in enumerate(arr):
    print(el)
    arr.remove(arr[i+1])

The second only prints even number as every next one is removed.

answered Jan 13 '23 at 08:40

Ben Zeen

72
7

1

That's exactly it, thanks! Good to keep in mind for the future. Would upvote, but I'm too irreputable. – Jen Jan 13 '23 at 08:50
1

Glad it helped! Never understood the weird reputation requirements on this site... Just accept the answer I guess. – Ben Zeen Jan 13 '23 at 08:54

Why is re not removing some values from my list?

1 Answers1