Removing words containing digits from a given string

Question

I'm trying to write a simple program that removes all words containing digits from a received string.

Here is my current implementation:

import re

def checkio(text):

    text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
    counter = 0
    words = text.split()

    print words

    for each in words:
        if bool(re.search(r'\d', each)):
            words.remove(each)

    print words

checkio("1a4 4ad, d89dfsfaj.")

However, when I execute this program, I get the following output:

['1a4', '4ad', 'd89dfsfaj']
['4ad']

I can't figure out why '4ad' is printed in the second line as it contains digits and should have been removed from the list. Any ideas?

You're modifying the list while iterating over it. See this question for why you shouldn't do that: http://stackoverflow.com/questions/10812272/modifying-a-list-while-iterating-over-it-why-not — bgporter, May 08 '15 at 12:36
What exactly are you trying to accomplish? What are your constraints and conditions? — Malik Brahimi, May 08 '15 at 12:36
It's not so much about constraints as understanding why this is going wrong. I see now that I'm modifying a list that I'm iterating over and that makes sense. — Rob, May 08 '15 at 13:28

score 0 · Answer 1 · answered May 08 '15 at 12:40

0

If you are testing for alpha numeric strings why not use isalnum() instead of regex ?

In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']

In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []

answered May 08 '15 at 12:40

fixxxer

15,568
15
58
76

Julien Spronck · Accepted Answer · 2015-05-08T12:52:12.517

Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.

import re

def checkio(text):

    text = re.sub('[,\.\?\!]', ' ', text).lower()
    words = [w for w in text.split() if not re.search(r'\d', w)]
    print words ## prints [] in this case

Also, note that I simplified your text = text.replace(...) line.

Additionally, if you do not need to reuse your text variable, you can use regex to split it directly.

import re

def checkio(text):

    words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
    print words ## prints [] in this case

score 0 · Answer 3 · answered May 08 '15 at 12:47

This would be possible through using re.sub, re.search and list_comprehension.

>>> import re
>>> def checkio(s):
        print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])


>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']

Martin Boyanov · Answer 4 · 2015-05-08T12:48:34.230

-1

So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.

At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it. Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.

edited May 08 '15 at 12:48

answered May 08 '15 at 12:38

Martin Boyanov

416
3
13

re.search returns a re.MatchObject – Julien Spronck May 08 '15 at 12:46

Removing words containing digits from a given string

4 Answers4