1

I have a string that follows these rules:

  1. Add a capital letter, unique to the entire string.
  2. Then add one or more groups of the pattern \d+z where \d is a digit i.e. a digit one or more times followed by a 'z'.
  3. Repeat 1 and 2 above, zero or more times.

An example string that follows the above rules is:

"A42z19z037z21z"      +
"B942z21z4842z"       +
"C33z449z3884z68z20z"

(This is one string broken down for appearance.)

I need a regex that effectively does the following:

  1. Go to a specified capital letter e.g. 'B'.
  2. Match each group of \d+z (see rule 2 above) between this capital letter and the next capital letter.

This seems to need two separate regexes, one to find the location of 'B', then one to match groups until the next capital letter. Can this be done in one regex?

EDIT:

So, using the above example, the matches would be "942z", "21z", and "4842z".

danger mouse
  • 1,457
  • 1
  • 18
  • 31
  • 1
    What exactly you need from the string? Only `\d+z` groups, and the the capital letter too? How do you see that as 2 separate regex? – Rohit Jain Jul 31 '16 at 13:41
  • Rohit Jain: I don't need the capital letter in the match, just the groups as given in the example. The first regex would match the location of 'B' within the string. The second regex would only start to look for matches of the \d+z groups from that given location. – danger mouse Jul 31 '16 at 13:47
  • Mariano: the first step is the adding of a capital letter. In my example these are 'A', 'B', and 'C', so Step 1 is repeated two times (executed three times in total). – danger mouse Jul 31 '16 at 13:48
  • 2
    Something like [`([A-Z]|(?!\A)\G)(\d+z)`](https://regex101.com/r/zX4hQ4/1)? – Wiktor Stribiżew Jul 31 '16 at 13:52
  • 1
    @mlm: I tried to write some code with the regex above, please check http://ideone.com/cbwUM1 and let know if you meant something like that. – Wiktor Stribiżew Jul 31 '16 at 14:12
  • @Wiktor Stribiżew: Thank you for the regex and the link. I tried removing the (?!\A) from the regex and it worked fine. I can't figure out why it's needed... – danger mouse Jul 31 '16 at 14:25
  • 1
    `\G` matches at the start of the string and at the end of the previous successful match. The lookahead removes the option to match at the beginning of the string. – Wiktor Stribiżew Jul 31 '16 at 14:26
  • 1
    @mlm test against `00z99zB942z21z`. Without `(?!\A)` It would match the first 2 tokens and it shouldn't – Mariano Jul 31 '16 at 14:32
  • I had just run that test! Thanks, I was going to comment. I guess if the rules by which the input string is composed i.e. always start with a capital letter, this part of the regex isn't needed. Btw if it helps to clarify, Steps 1-3 at the top of my post relate to how the input string is composed, not to the matching process (Steps 1 and 2) later on. – danger mouse Jul 31 '16 at 14:37
  • 1
    It makes sense now. – Mariano Jul 31 '16 at 14:53

2 Answers2

3

Answering the question: you cannot match and capture several groups with one capture group that has a quantifier after it (aka repeated capturing group) with Java regex.

I suggest using a regex with a \G-based boundary:

([A-Z]|(?!\A)\G)(\d+z)

See the regex demo

Pattern details:

  • ([A-Z]|(?!\A)\G) - Group 1 capturing either an uppercase ASCII letter or the end of the previous successful match
  • (\d+z) - Group 2 capturing 1+ digits and a z.

Here is a Java demo:

String value1 = "A42z19z037z21zB942z21z4842zC33z449z3884z68z20z";
String pattern1 = "([A-Z]|(?!\\A)\\G)(\\d+z)";
Pattern ptrn = Pattern.compile(pattern1);
Matcher matcher = ptrn.matcher(value1);
ArrayList<ArrayList<String>> result_lst = new ArrayList<ArrayList<String>>();
ArrayList<String> lst = null;
while (matcher.find()) {
    if (!matcher.group(1).equals("")) {
        if (lst != null) result_lst.add(lst);
        lst = new ArrayList<String>();
        lst.add(matcher.group(1));
    }
    else {
        lst.add(matcher.group(2));
    }
}
if (lst != null) result_lst.add(lst);
System.out.println(result_lst);

Output: [[A, 19z, 037z, 21z], [B, 21z, 4842z], [C, 449z, 3884z, 68z, 20z]]

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
-1

Try:

([A-Z])*([\d]+z)

group 1 = capital letter, group 2 = target values

Andrew Nodermann
  • 610
  • 8
  • 13