1

Say I have a string like

"There are LJFK$(#@$34)(,0,ksdjf apples in the (4,5)"  

I want to be able to dynamically extract the numbers into a list: [34, 0, 4, 5].
Is there an easy way to do this in Python?

In other words,
Is there some way to extract contiguous numeric clusters separated by any delimiter?

pradyunsg
  • 18,287
  • 11
  • 43
  • 96
John Smith
  • 11,678
  • 17
  • 46
  • 51
  • Possible dup http://stackoverflow.com/questions/4289331/python-extract-numbers-of-a-string – Shmil The Cat Apr 01 '13 at 15:10
  • If the string were `"12.34"`, would you want `[12, 34]` or `[12.34]`? IOW, is it only contiguous-digit integers you want? – DSM Apr 01 '13 at 15:15
  • In this case it would be [12, 34], integers. The current answer works as desired (I just can't accept it yet) – John Smith Apr 01 '13 at 15:18

2 Answers2

7

Sure, use regular expressions:

>>> s = "There are LJFK$(#@$34)(,0,ksdjf apples in the (4,5)"
>>> import re
>>> list(map(int, re.findall(r'[0-9]+', s)))
[34, 0, 4, 5]
phihag
  • 278,196
  • 72
  • 453
  • 469
  • Using a list comprehension is usually preferable to using `map`. Especially since you're just casting the result to a list anyway. – Cairnarvon Apr 01 '13 at 15:12
  • 1
    @Cairnarvon It usually is, except if you can simply call an existing function (because then you don't have to figure out the name of a temporary variable). The list creation is just for the nice output. If you were to iterate over the result, you could obviously skip it. – phihag Apr 01 '13 at 15:15
  • You could have use `\d+` for the regex too. – pradyunsg Apr 01 '13 at 16:41
  • @Schoolboy Yes, but then one would have to use something significantly more complicated than `int` to support inputs like `٣٤`. – phihag Apr 01 '13 at 16:48
  • @phihag Why is that?? how will those inputs get through the filter?? – pradyunsg Apr 01 '13 at 16:56
  • @Schoolboy `\d` matches arbitrary digits (and [is **not** equivalent to `[0-9]`](http://stackoverflow.com/a/6479605/35070)), and those characters [are](http://www.fileformat.info/info/unicode/char/663/index.htm) [digits](http://www.fileformat.info/info/unicode/char/664/index.htm). Therefore, [`'٣٤'` matches `r'\d'`](http://ideone.com/3k1I43). – phihag Apr 01 '13 at 17:04
  • @phihag Ok, now I get it. Thanks learned something new.. :) – pradyunsg Apr 01 '13 at 17:15
2

You can also do this without regular expressions, although it requires some more work:

>>> s = "There are LJFK$(#@$34)(,0,ksdjf apples in the (4,5)"
>>> #replace nondigit characters with a space
... s = "".join(x if x.isdigit() else " " for x in s)
>>> print s
                   34   0                      4 5
>>> #get the separate digit strings
... digitStrings = s.split()
>>> print digitStrings
['34', '0', '4', '5']
>>> #convert strings to numbers
... numbers = map(int, digitStrings)
>>> print numbers
[34, 0, 4, 5]
Kevin
  • 74,910
  • 12
  • 133
  • 166