I'm trying to write a Python library to parse our version format strings. The (simplified) version string format is as follows:
<product>-<x>.<y>.<z>[-alpha|beta|rc[.<n>]][.<extra>]][.centos|redhat|win][.snb|ivb]
This is:
- product, ie
foo
- numeric version, ie:
0.1.0
- [optional] pre-release info, ie:
beta
,rc.1
,alpha.extrainfo
- [optional] operating system, ie:
centos
- [optional] platform, ie:
snb
,ivb
So the following are valid version strings:
1) foo-1.2.3
2) foo-2.3.4-alpha
3) foo-3.4.5-rc.2
4) foo-4.5.6-rc.2.extra
5) withos-5.6.7.centos
6) osandextra-7.8.9-rc.extra.redhat
7) all-4.4.4-rc.1.extra.centos.ivb
For all of those examples, the following regex works fine:
^(?P<prod>\w+)-(?P<maj>\d).(?P<min>\d).(?P<bug>\d)(?:-(?P<pre>alpha|beta|rc)(?:\.(?P<pre_n>\d))?(?:\.(?P<pre_x>\w+))?)?(?:\.(?P<os>centos|redhat|win))?(?:\.(?P<plat>snb|ivb))?$
But the problem comes in versions of this type (no 'extra' pre-release information, but with os and/or platform):
8) issue-0.1.0-beta.redhat.snb
With the above regex, for string #8, redhat
is being picked up in the pre-release extra info pre_x
, instead of the os
group.
I tried using look-behind to avoid picking the os or platform strings in pre_x
:
...(?:\.(?P<pre_x>\w+))?(?<!centos|redhat|win|ivb|snb))...
That is:
^(?P<prod>\w+)-(?P<maj>\d).(?P<min>\d).(?P<bug>\d)(?:-(?P<pre>alpha|beta|rc)(?:\.(?P<pre_n>\d))?(?:\.(?P<pre_x>\w+))?(?<!centos|redhat|win|ivb|snb))?(?:\.(?P<os>centos|redhat|win))?(?:\.(?P<plat>snb|ivb))?$
This would work fine if Python's standard module re
could accept variable width look behind. I would rather try to stick to the standard module, rather than using regex as my library is quite likely to be distributed to a large number machines, where I want to limit dependencies.
I've also had a look at similar questions: this, this and this are not aplicable.
Any ideas on how to achieve this?
My regex101 link: https://regex101.com/r/bH0qI7/3
[For those interested, this is the full regex I'm actually using: https://regex101.com/r/lX7nI6/2]