-1

I am looking for a regular expression that would match a simple custom version number scheme which is composed of an unlimited number of series of digits separated by single periods under the following constraints:

  • Match a single line by asserting positions at both start and end of line with ^ and $ tokens. Since a version number is a single line it doesn't make sense to do multi-line matching.

  • No letters, white-spaces, or special characters are allowed.

  • The line cannot start or end with a period and after the initial number each following series of digits has to be preceded with a single period.

As previously mentioned it should be assumed that a given scheme can have an unlimited number of categories thus the regular expression should be able to capture an unlimited amount of groups each representing a unique version category.

Scheme Example

<MajorVersion>.<MinorVersion>.<BuildNumber>

Captured Groups

$1 = MajorVersion, $2 = MinorVersion, $3 = BuildNumber

The above can be translated into an actual example:

Version number: 0.1.2 = [ $1 = 0, $2 = 1, $3 = 2 }

Test Cases

Should pass - 010.98
Captured groups = { $1 = 010, $2 = 98 }

Should pass - 0.12.3344.2.1
Captured groups = { $1 = 0, $2 = 12, $3 = 3344, $4 = 2, $5 = 1 }

Should fail - 0 23.42    // Contains white-spaces
Should fail - 1.2..3.4   // Contains consecutive period symbols
Should fail - .2.58.6    // Starts with a period symbol
Should fail - 64#23.4    // Contains special characters

Current Solution

I am trying to implement the parsing solution in Java and am not happy with my current solution that requires me to parse through the given version number String twice:

  • Once to validate that the String is a valid version number that conforms to the constraints listed above with the following regex:
     ^\d+(?:\.\d+)*$
  • Once to capture each series of digits as a separate version category using using positive lookbehind with the following regex:
     (?<=^|\.)\d+

For those interested in providing a Java solution here is the code I'm using for testing:


public static final Pattern SIMPLE_VERSION_NUMBER_MATCH = Pattern.compile("^\\d{1}(?:\\.\\d)*$");
public static final Pattern SIMPLE_VERSION_NUMBER_GROUPS = Pattern.compile("(?<=^|\\.)\\d+");

@Test
public void testRegExMathCollection() {

    String versionNumber = "0.1.2.3";
    Assertions.assertTrue(RegExPatterns.SIMPLE_VERSION_NUMBER_MATCH.matcher(versionNumber).find());
    assertPatternMatchesGroups(RegExPatterns.SIMPLE_VERSION_NUMBER_GROUPS, versionNumber, "0", "1", "2", "3");
}

@TestOnly
private void assertPatternMatchesGroups(Pattern pattern, String text, String... groups) {

    String[] matches = RegExUtils.collectMatches(pattern.matcher(text));
    Assertions.assertArrayEquals(groups, matches);
}

public static String[] collectMatches(Matcher matcher) {

    List<String> matches = new java.util.ArrayList<>();
    while (matcher.find()) {
        matches.add(matcher.group());
    }
    return matches.toArray(new String[0]);
}

Question Segment

My question to you is two-fold:

  • What is the best way to solve this problem using a single regular expression?
  • If the above is not feasible are there more optimal patterns then the ones I am currently using?
  • If in your opinion regex is not the best approach, what Java implementation would you recommend to solve this problem?

Edit: Note that this is primarily a question about regular expressions as the primary objective is to get a single regex that is able to both validate the version number according to constraints provided above as well as capture groups. I only asked for a better Java solution as a fallback in case what I want is not possible to do with regular expressions.

Matthew
  • 1,905
  • 3
  • 19
  • 26
  • Possible duplicate of [How do you compare two version Strings in Java?](https://stackoverflow.com/questions/198431/how-do-you-compare-two-version-strings-in-java) – tkruse Jun 29 '19 at 23:23
  • @tkruse This question is primarily asking about a **regex** solution. I already have a working Java solution, the added question was to see if anyone has a better Java approach then what I'm already doing. – Matthew Jun 29 '19 at 23:28
  • It would help if your question listed more testcases, in particular invalid strings that you want to reject even though they are almost correct. – tkruse Jun 29 '19 at 23:46
  • Was thinking of including that but I though the post would be too long for most people to bother reading, but I shall include the test-cases now. – Matthew Jun 29 '19 at 23:51
  • Related: https://stackoverflow.com/questions/37003623, https://stackoverflow.com/questions/8843410 – tkruse Jun 29 '19 at 23:53
  • I've read that one and haven't found a solution there. – Matthew Jun 29 '19 at 23:56

2 Answers2

2

Use "1.2.3.4.5".split("\\."), see other questions linked.

Using a regex is more useful when you need to find a pattern in a larger string, or when you need to check whether a String has a required format, or when the strings contain additional characters you want to ignore.

If you know all you inputs are well-formed, regex is not providing advantages over simple splitting.

tkruse
  • 10,222
  • 7
  • 53
  • 80
  • This is a good Java substitution for my second regex: `(?<=^|\.)\d+`. It does the same only with less code to write and probably less overhead. I don't know why I didn't think of this earlier, guess I was too busy trying to force regex. – Matthew Jun 29 '19 at 23:31
1

I think you can use this Regex: \d+.\d+.\d+

If it does not work, do you have more examples to test it with?

Cambesa
  • 456
  • 1
  • 4
  • 15
  • I have reiterated multiple times that one of the key constraints is that the pattern has to match an **unlimited** number of version categories. So the regular expression you provided would not work for the following version number: `0.1.2.3.4`. – Matthew Jun 29 '19 at 22:16
  • Allright, I thought a maximum of two periods with an unlimited number of digits aside those periods – Cambesa Jun 29 '19 at 22:18
  • Perhaps this is a better Regex: (\d+[.])+\d+ When I test this, I get two results, a group1 match and a full match. The full matches are always correct so hopefully your program only accepts the full matches – Cambesa Jun 29 '19 at 22:55
  • I don't see how that's a better regex then the ones I am currently using, it doesn't even properly isolate a single version category as you can see here: https://regex101.com/r/0H7URu/1 – Matthew Jun 29 '19 at 23:04
  • For all intents and purposes that regex is the same as the first one I provided: `^\d+(?:\.\d)*$`, the only difference being that your reverses the order of matching but the result is still the same. Also it's important to note that it matches across multiple lines and one of the constraints I listed was that it is not allowed to do that so it's missing `^` and `$` tokens. – Matthew Jun 29 '19 at 23:17