I am looking for a regular expression that would match a simple custom version number scheme which is composed of an unlimited number of series of digits separated by single periods under the following constraints:
Match a single line by asserting positions at both start and end of line with
^
and$
tokens. Since a version number is a single line it doesn't make sense to do multi-line matching.No letters, white-spaces, or special characters are allowed.
The line cannot start or end with a period and after the initial number each following series of digits has to be preceded with a single period.
As previously mentioned it should be assumed that a given scheme can have an unlimited number of categories thus the regular expression should be able to capture an unlimited amount of groups each representing a unique version category.
Scheme Example
<MajorVersion>.<MinorVersion>.<BuildNumber>
Captured Groups
$1 = MajorVersion, $2 = MinorVersion, $3 = BuildNumber
The above can be translated into an actual example:
Version number: 0.1.2 = [ $1 = 0, $2 = 1, $3 = 2 }
Test Cases
Should pass - 010.98
Captured groups = { $1 = 010, $2 = 98 }
Should pass - 0.12.3344.2.1
Captured groups = { $1 = 0, $2 = 12, $3 = 3344, $4 = 2, $5 = 1 }
Should fail - 0 23.42 // Contains white-spaces
Should fail - 1.2..3.4 // Contains consecutive period symbols
Should fail - .2.58.6 // Starts with a period symbol
Should fail - 64#23.4 // Contains special characters
Current Solution
I am trying to implement the parsing solution in Java and am not happy with my current solution that requires me to parse through the given version number String
twice:
- Once to validate that the
String
is a valid version number that conforms to the constraints listed above with the following regex:
^\d+(?:\.\d+)*$
- Once to capture each series of digits as a separate version category using using positive lookbehind with the following regex:
(?<=^|\.)\d+
For those interested in providing a Java solution here is the code I'm using for testing:
public static final Pattern SIMPLE_VERSION_NUMBER_MATCH = Pattern.compile("^\\d{1}(?:\\.\\d)*$");
public static final Pattern SIMPLE_VERSION_NUMBER_GROUPS = Pattern.compile("(?<=^|\\.)\\d+");
@Test
public void testRegExMathCollection() {
String versionNumber = "0.1.2.3";
Assertions.assertTrue(RegExPatterns.SIMPLE_VERSION_NUMBER_MATCH.matcher(versionNumber).find());
assertPatternMatchesGroups(RegExPatterns.SIMPLE_VERSION_NUMBER_GROUPS, versionNumber, "0", "1", "2", "3");
}
@TestOnly
private void assertPatternMatchesGroups(Pattern pattern, String text, String... groups) {
String[] matches = RegExUtils.collectMatches(pattern.matcher(text));
Assertions.assertArrayEquals(groups, matches);
}
public static String[] collectMatches(Matcher matcher) {
List<String> matches = new java.util.ArrayList<>();
while (matcher.find()) {
matches.add(matcher.group());
}
return matches.toArray(new String[0]);
}
Question Segment
My question to you is two-fold:
- What is the best way to solve this problem using a single regular expression?
- If the above is not feasible are there more optimal patterns then the ones I am currently using?
- If in your opinion regex is not the best approach, what Java implementation would you recommend to solve this problem?
Edit: Note that this is primarily a question about regular expressions as the primary objective is to get a single regex that is able to both validate the version number according to constraints provided above as well as capture groups. I only asked for a better Java solution as a fallback in case what I want is not possible to do with regular expressions.