1

I have the following regex method which does the matches in 3 stages for a given string. But for some reason the Regex fails to check some of the things. As per whatever knowledge I have gained by working they seem to be correct. Can someone please correct me what am I doing wrong here?

I have the following code:

public class App {
    public static void main(String[] args) {
        String identifier = "urn:abc:de:xyz:234567.1890123";

        if (identifier.matches("^urn:abc:de:xyz:.*")) {
            System.out.println("Match ONE");

            if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[0-9]{1,7}.*")) {
                System.out.println("Match TWO");

                if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[a-zA-Z0-9.-_]{1,20}$")) {
                    System.out.println("Match Three");
                }
            }
        }

    }
}

Ideally, this code should generate the output

Match ONE
Match TWO
Match Three

Only when the identifier = "urn:abc:de:xyz:234567.1890123.abd12" but it provides the same output event if the identifier does not match the regex such as for the following inputs:

"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ANC"
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ACB.123"

I am not understanding why is it allowing the Alphanumeric characters after the . and also it does not care about the characters after the second ..

I would like my Regex to check that the string has the following format:

  1. String starts with urn:abc:de:xyz:
  2. Then it has the numbers [0-9] which range from 6 to 12 (234567).
  3. Then it has the decimal point .
  4. Then it has the numbers [0-9] which range from 1 to 7 (1890123)
  5. Then it has the decimal point ..
  6. Finally it has the alphanumeric character and spcial character which range from 1 to 20 (ABC123.-_12).

This is an valid string for my regex: urn:abc:de:xyz:234567.1890123.ABC123.-_12

This is an invalid string for my regex as it misses the elements from point 6: urn:abc:de:xyz:234567.1890123

This is also an invalid string for my regex as it misses the elements from point 4 (it has ABC instead of decimal numbers). urn:abc:de:xyz:234567.1890ABC.ABC123.-_12

BATMAN_2008
  • 2,788
  • 3
  • 31
  • 98
  • See [the regex reference](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean). `.` not only matches the dot character. It matches almost everything. – Sweeper Feb 06 '21 at 08:43
  • Because `.` has special meaning in regex. That’s why your `.*` at the end works - it means (almost) any character 0 or more times. – Boris the Spider Feb 06 '21 at 08:43
  • @Sweeper How can I ensure there is a `.` and the required characters in my String? How can I modify my `regex` matching expression to check for these things? – BATMAN_2008 Feb 06 '21 at 08:57
  • @BoristheSpider Thanks for the response. How can I modify it to make it work and check for the things I need? – BATMAN_2008 Feb 06 '21 at 08:57
  • 1
    I'm not sure what you are checking for... What exactly do you want to match, and what do you not want to match? – Sweeper Feb 06 '21 at 08:58
  • @Sweeper I have modified the question and added the constraint which I am looking to add to my regex. Please let me know if you are able to understand and help me with this. – BATMAN_2008 Feb 06 '21 at 09:05
  • Is [that](https://regex101.com/r/SlM7E6/1) what you want? – Toto Feb 06 '21 at 11:22
  • Yup Thanks thats working :) – BATMAN_2008 Feb 06 '21 at 11:29

2 Answers2

2

This part of the regex:

  • [0-9]{6,12}.[0-9]{1,7} matches 6 to 12 digits followed by any character followed by 1 to 7 digits

To match a dot, it needs to be escaped. Try this:

^urn:abc:de:xyz:[0-9]{6,12}\.[0-9]{1,7}\.[a-zA-Z0-9\-_]{1,20}$
alex-dl
  • 802
  • 1
  • 5
  • 12
  • But if I escape the `.` character then it would not check if there is a `.` character or not right? – BATMAN_2008 Feb 06 '21 at 09:07
  • @BATMAN_2008 No, it would only match exactly the dot, if you escape it. Otherwise it would match any non-newline character. – Sweeper Feb 06 '21 at 09:10
  • The backslash-dot sequence does match a dot. After matching the dot it can be removed between the last square brackets, if no more dots need be be matched. – alex-dl Feb 06 '21 at 09:11
0

This will match with any number of dot alphanum at the end of the string as your examples:

^urn:abc:de:xyz:\d{6,12}\.\d{1,7}(?:\.[\w-]{1,20})+$

Demo & explanation

Toto
  • 89,455
  • 62
  • 89
  • 125