0

I'm trying to write a simple program that removes all words containing digits from a received string.

Here is my current implementation:

import re

def checkio(text):

    text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
    counter = 0
    words = text.split()

    print words

    for each in words:
        if bool(re.search(r'\d', each)):
            words.remove(each)

    print words

checkio("1a4 4ad, d89dfsfaj.")

However, when I execute this program, I get the following output:

['1a4', '4ad', 'd89dfsfaj']
['4ad']

I can't figure out why '4ad' is printed in the second line as it contains digits and should have been removed from the list. Any ideas?

Yoel
  • 9,144
  • 7
  • 42
  • 57
Rob
  • 3,333
  • 5
  • 28
  • 71
  • Add your expected output too – Bhargav Rao May 08 '15 at 12:33
  • 1
    You're modifying the list while iterating over it. See this question for why you shouldn't do that: http://stackoverflow.com/questions/10812272/modifying-a-list-while-iterating-over-it-why-not – bgporter May 08 '15 at 12:36
  • 1
    What exactly are you trying to accomplish? What are your constraints and conditions? – Malik Brahimi May 08 '15 at 12:36
  • It's not so much about constraints as understanding why this is going wrong. I see now that I'm modifying a list that I'm iterating over and that makes sense. – Rob May 08 '15 at 13:28

4 Answers4

0

If you are testing for alpha numeric strings why not use isalnum() instead of regex ?

In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']

In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []
fixxxer
  • 15,568
  • 15
  • 58
  • 76
0

Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.

import re

def checkio(text):

    text = re.sub('[,\.\?\!]', ' ', text).lower()
    words = [w for w in text.split() if not re.search(r'\d', w)]
    print words ## prints [] in this case

Also, note that I simplified your text = text.replace(...) line.

Additionally, if you do not need to reuse your text variable, you can use regex to split it directly.

import re

def checkio(text):

    words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
    print words ## prints [] in this case
Julien Spronck
  • 15,069
  • 4
  • 47
  • 55
0

This would be possible through using re.sub, re.search and list_comprehension.

>>> import re
>>> def checkio(s):
        print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])


>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
-1

So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.

At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it. Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.

Martin Boyanov
  • 416
  • 3
  • 13