Best way to get split version number with regular expression

Question

I have this string (it's part of a file):

{
    return array(
        'major'     => '1',
        'minor'     => '9',
        'revision'  => '1',
        'patch'     => '1',
        'stability' => '',
        'number'    => '',
    );
}

I need to form a proper version number out of this, in this case "1.9.1.1". I have already written the code doing this, but I would like to know if there is a better, more beautiful solution, or one that requires less code. I've been thinking about using a more complex regular expression that returns all parts of the version number, but I couldn't figure out how, and returning a match like "1911" might cause more trouble than its worth when there's a two-digit number involved, e.g. "1.10.1.1". In this case it would be impossible to know where to split the "11011" as it might as well be "11.0.1.1" or "1.1.0.11".

Here's what I've got (in Python code):

        result = []
        result.append(re.search("'major'\\s+=>\\s+'(\\d+)'", text))
        result.append(re.search("'minor'\\s+=>\\s+'(\\d+)'", text))
        result.append(re.search("'revision'\\s+=>\\s+'(\\d+)'", text))
        result.append(re.search("'patch'\\s+=>\\s+'(\\d+)'", text))

        str = ""
        for res in result:
            if res:
                str += res.group(1) + "."

        return str[:-1]

If you are guaranteed the order and content layout, then you *could* just look for numbers, but that introduces assumptions and might not be as clear and robust as the one you currently have. — npinti, Sep 28 '15 at 07:33
instead of the string concatenation beginning at `str = ""` you could just do `return '.'.join(result)` — StefanNch, Sep 28 '15 at 07:36
@StefanNch not quite, as `result` contains a mix of `None` and regex match objects. — jonrsharpe, Sep 28 '15 at 07:37
@npinti; The code snipped is part of a larger file, so I'd like to avoid using a regular expression that might match something I'm not looking for. The order of "major" -> "minor" -> "revision" -> "patch" should be unchanging, though! — R.G., Sep 28 '15 at 07:38
Result must be [`1.9.1.1`](http://ideone.com/rEWafk), right? — Wiktor Stribiżew, Sep 28 '15 at 07:44
@stribizhev Ah yes, you're right! I accidentally wrote "1.9.2.1" in my question due to copy-pasting. Will edit! — R.G., Sep 28 '15 at 07:47
You can make for loop in list comphrension, Like ".".join([each.group(1) for each in result if each]) — Vineesh, Sep 28 '15 at 07:48
I might be inclined to build a simple class, maybe with `__slots__`, to do this - it would `__init__` from the six parts, `__str__` back to the human-friendly representation, and define a `from_str` class method that does the parsing. — jonrsharpe, Sep 28 '15 at 07:49

score 2 · Accepted Answer · answered Sep 28 '15 at 07:47

2

You can use a regex that will capture all the numeric values from the consecutive array elements with re.findall and then join the captured numbers with .:

import re
s = """{
    return array(
        'major'     => '1',
        'minor'     => '9',
        'revision'  => '1',
        'patch'     => '1',
        'stability' => '',
        'number'    => '',
    );
}
"""
ptn = r"return\s+array\s*\(\s*'major'\s*=>\s*'(\d*)',\s*'minor'\s*=>\s*'(\d*)',\s*\s*'revision'\s*=>\s*'(\d*)',\s*\s*'patch'\s*=>\s*'(\d*)"
print (".".join(*re.findall(ptn, s)))

See IDEONE demo

answered Sep 28 '15 at 07:47

Wiktor Stribiżew

607,720
39
448
563

The regex is rather basic, it consists of literals and shorthand classes like `\s` (whitespace) and `\d` (digits). I am using `*` quantifier to match 0 or more characters, but if you are sure there is at least 1 digit, you may replace it with `+`. If `return array (` is not obligatory at the beginning, you may remove `return\s+array\s*\(\s*` from the pattern. – Wiktor Stribiżew Sep 28 '15 at 07:50
Awesome, thanks! That's actually pretty close to what I was looking for and the regex looks like a more elaborate version of what I was initially trying to do. – R.G. Sep 28 '15 at 07:57
I also didn't know about the r prefix. Pretty useful! – R.G. Sep 28 '15 at 08:00
Oh yes, raw string literals simplify regex writing. See [*What exactly is a “raw string regex” and how can you use it?*](http://stackoverflow.com/questions/12871066/what-exactly-is-a-raw-string-regex-and-how-can-you-use-it) for more details. – Wiktor Stribiżew Sep 28 '15 at 08:02

score 2 · Answer 2 · answered Sep 28 '15 at 07:51

If there is always only one version info in your large source file, using re.findall would be much simpler:

import re

s = '''{
    return array(
        'major'     => '1',
        'minor'     => '9',
        'revision'  => '1',
        'patch'     => '1',
        'stability' => '',
        'number'    => '',
    );
}'''


def get_version_number(s):
    version_fields = ('major', 'minor', 'revision', 'patch')
    version_dict = dict(re.findall(r"'(%s)'\s*=>\s*'(\d*)'" % '|'.join(version_fields), s))
    return '.'.join(version_dict.get(key, '') for key in version_fields)


if __name__ == '__main__':
    print get_version_number(s)

score 1 · Answer 3 · answered Sep 28 '15 at 08:02

I actually quite like your code, because it is very clear what you are trying to do. Putting everything in one big regex makes it harder to understand IMO. What you could do to clean it up a little is this:

import re
s = """{
    return array(
        'major'     => '1',
        'minor'     => '9',
        'revision'  => '1',
        'patch'     => '1',
        'stability' => '',
        'number'    => '',
    );
}
"""
baseregex = "'{}'\\s+=>\\s+'(\\d+)'"
keys = 'major', 'minor', 'revision', 'patch'
result = [re.search(baseregex.format(key)) for key in keys]
print '.'.join([res.group(1) for res in result if res])

I see, yes, that's a valid point. This solution is also very helpful to me, I didn't know you could use format to "fill in" information in a regex. I've also never worked with inline for-in (still new to Python), but I really like that you can do things like this in just on line instead of two or three. — R.G., Sep 28 '15 at 08:14

sureshvv · Answer 4 · 2015-09-29T05:57:23.233

1

Actually you may not need re, especially if you subscribe to the Now you have two problems philosophy (http://regex.info/blog/2006-09-15/247)

Check this (s1 is your input string):

clean = lambda x: x.split('=>')[1].strip().rstrip(',').strip("'") \
    if '=>' in x else ''
version = '.'.join([clean(x) for x in s1.splitlines() if clean(x)])

edited Sep 29 '15 at 05:57

answered Sep 28 '15 at 08:13

sureshvv

4,234
1
26
32

score 0 · Answer 5 · answered Sep 28 '15 at 13:58

You can do it that way:

import re

s = '''{
    return array(
        'major'     => '1',
        'minor'     => '9',
        'revision'  => '1',
        'patch'     => '1',
        'stability' => '',
        'number'    => '',
    );
}'''

version_list = ('major', 'minor', 'revision', 'patch')

version = []

for i in version_list:
    version.append(re.search("'(" + i + ")'\s+=>\s+'(\d)'", s).group(2))

print '.'.join(version)

Best way to get split version number with regular expression

5 Answers5