17

I am trying to filter a list of strings with regular expressions, as shown in this answer. However the code gives an unexpected result:

In [123]: r = re.compile('[0-9]*')
In [124]: string_list = ['123', 'a', '467','a2_2','322','21']
In [125]: filter(r.match, string_list)
Out[125]: ['123', 'a', '467', 'a2_2', '322_2', '21']

I expected the output to be ['123', '467', '21'].

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mike Vella
  • 10,187
  • 14
  • 59
  • 86

2 Answers2

18

The problem is that your pattern contains the *, quantifier, will match zero or more digits. So even if the string doesn't contain a digit at all, it will match the pattern. Furthermore, your pattern will match digits wherever they occur in the input string, meaning, a2 is still a valid match because it contains a digit.

Try using this pattern

^[0-9]+$

Or more simply:

^\d+$

This will match one or more digits. The start (^) and end ($) anchors ensure that no other characters will be allowed within the string.

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
8

Is there really a need for Regex here? You have str.isdigit:

>>> string_list = ['123', 'a', '467','a2_2','322','21']
>>> [x for x in string_list if x.isdigit()]
['123', '467', '322', '21']
>>>
  • In this specific example, no there isn't, but I wanted to know why it wasn't working the way I expected. – Mike Vella Dec 22 '13 at 21:06
  • 2
    What will happen with let's say `1e6`? :) –  Dec 22 '13 at 21:06
  • @Allendar `>>> '23e1'.isdigit()` is `False` – Mike Vella Dec 22 '13 at 21:08
  • 1
    @Allendar - Yes, that will fail. :) However, judging by his original Regex pattern, it doesn't look like the OP has numbers like that. –  Dec 22 '13 at 21:08
  • 1
    Sounds reasonable. I guess it's nice for variation too to see different approaches as a solution :). Btw 1e6 seems to return a float here in both Python 2 and 3. –  Dec 22 '13 at 21:09
  • i am a bit confused about pythons definition of 'digit' `"23".isdigit()` would be false in my understandings of a 'digit' because "23" actually are two digits. I would expect this to be true only for strings that match the pattern `^[0-9]$`. – vlad_tepesch Jul 01 '16 at 13:18