2

What is the cleanest way to obtain a list of the numeric values in a string?

For example:

string = 'version_4.11.2-2-1.4'
array  = [4, 11, 2, 2, 1, 4]

As you might understand, I need to compare versions.

By "cleanest", I mean as simple / short / readable as possible.

Also, if possible, then I prefer built-in functions over regexp (import re).

This is what I've got so far, but I feel that it is rather clumsy:

array = [int(n) for n in ''.join(c if c.isdigit() else ' ' for c in string).split()]

Strangely enough, I have not been able to find an answer on SO:

  • In this question, the input numeric values are assumed to be separated by white spaces
  • In this question, the input numeric values are assumed to be separated by white spaces
  • In this question, the user only asks for a single numeric value at the beginning of the string
  • In this question, the user only asks for a single numeric value of all the digits concatenated

Thanks

Community
  • 1
  • 1
barak manos
  • 29,648
  • 10
  • 62
  • 114

5 Answers5

6

Just match on consecutive digits:

map(int, re.findall(r'\d+', versionstring))

It doesn't matter what's between the digits; \d+ matches as many digits as can be found in a row. This gives you the desired output in Python 2:

>>> import re
>>> versionstring = 'version_4.11.2-2-1.4'
>>> map(int, re.findall(r'\d+', versionstring))
[4, 11, 2, 2, 1, 4]

If you are using Python 3, map() gives you an iterable map object, so either call list() on that or use a list comprehension:

[int(d) for d in re.findall(r'\d+', versionstring)]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks. I'm using Python 2. So I take it (also from all the other answers) that there is not much more which can be achieved with built-in functions, and that `re` is the cleanest solution. Is that correct? – barak manos Mar 28 '16 at 12:49
  • @barakmanos `re` is part of the standard library, treat it like a built-in. It is the cleanest solution for your specific problem, yes. – Martijn Pieters Mar 28 '16 at 12:50
5

I'd solve this with a regular expression, too.

I prefer re.finditer over re.findall for this task. re.findall returns a list, re.finditer returns an iterator, so with this solution you won't create a temporary list of strings:

>>> [int(x.group()) for x in re.finditer('\d+', string)]
[4, 11, 2, 2, 1, 4]
timgeb
  • 76,762
  • 20
  • 123
  • 145
0

Regex is definitely the best way to go as @MartijnPieters answer clearly shows, but if you don't want to use it, you probably can't use a list comprehension. This is how you could do it, though:

def getnumbers(string):
    numberlist = []
    substring = ""
    for char in string:
        if char.isdigit():
            substring += char
        elif substring:
            numberlist.append(int(substring))
            substring = ""
    if substring:
        numberlist.append(int(substring))
    return numberlist
zondo
  • 19,901
  • 8
  • 44
  • 83
0

You are tracking every character and checking if it is a digit, if yes you are adding it to a list, Gets slow for larger strings.

Let's say,

  import re
    string='version_4.11.2-2-1.4.9.7.5.43.2.57.9.5.3.46.8.5'
    l=map(int, re.findall('\d+',string))
    print l

Hopefully, this should work. Not sure in the answer above why are we using 'r'.

Ajinkya Patil
  • 741
  • 1
  • 6
  • 17
0

You can simply resolve this using regular expressions.

import re
string = 'version_4.11.2-2-1.4'
p=re.compile(r'\d+')
p.findall(string)
Arshad
  • 1