2

My regex is something like this **(A)(([+-]\d{1,2}[YMD])*)** which is matching as expected like A+3M, A-3Y+5M+3D etc..

But I want to capture all the groups of this sub pattern**([+-]\d{1,2}[YMD])*** For the following example A-3M+2D, I can see only 4 groups. A-3M+2D (group 0), A(group 1), -3M+2D (group 2), +2D (group 3)

Is there a way I can get the **-3M** as a separate group?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
nantitv
  • 3,539
  • 4
  • 38
  • 61
  • 1
    It is not a recursive pattern, it is a pattern with a *repeated capturing group*. What is your programming language? See also [How to capture multiple repeated groups?](https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups) – Wiktor Stribiżew Apr 06 '20 at 11:16
  • Corrected it. `It's kotlin – nantitv Apr 06 '20 at 11:27
  • Marcin, please do not remove the `kotlin` tag, this is a Kotlin related question, as confirmed by OP. As the [regex tag info](http://stackoverflow.com/tags/regex/info) states, all questions with this tag should also include a tag specifying the applicable programming language or tool. – Wiktor Stribiżew Apr 06 '20 at 12:09

1 Answers1

1

Repeated capturing groups usually capture only the last iteration. This is true for Kotlin, as well as Java, as the languages do not have any method that would keep track of each capturing group stack.

What you may do as a workaround, is to first validate the whole string against a certain pattern the string should match, and then either extract or split the string into parts.

For the current scenario, you may use

val text = "A-3M+2D" 
if (text.matches("""A(?:[+-]\d{1,2}[YMD])*""".toRegex())) {
  val results =  text.split("(?=[-+])".toRegex())
  println(results)
}
// => [A, -3M, +2D]

See the Kotlin demo

Here,

  • text.matches("""A(?:[+-]\d{1,2}[YMD])*""".toRegex()) makes sure the whole string matches A and then 0 or more occurrences of + or -, 1 or 2 digits followed with Y, M or D
  • .split("(?=[-+])".toRegex()) splits the text with an empty string right before a - or +.

Pattern details

  • ^ - implicit in .matches() - start of string
  • A - an A substring
  • (?: - start of a non-capturing group:
    • [+-] - a character class matching + or -
    • \d{1,2} - one to two digits
    • [YMD] - a character class that matches Y or M or D
  • )* - end of the non-capturing group, repeat 0 or more times (due to * quantifier)
  • \z - implicit in matches() - end of string.

When splitting, we just need to find locations before - or +, hence we use a positive lookahead, (?=[-+]), that matches a position that is immediately followed with + or -. It is a non-consuming pattern, the + or - matched are not added to the match value.

Another approach with a single regex

You may also use a \G based regex to check the string format first at the start of the string, and only start matching consecutive substrings if that check is a success:

val regex = """(?:\G(?!^)[+-]|^(?=A(?:[+-]\d{1,2}[YMD])*$))[^-+]+""".toRegex()
println(regex.findAll("A-3M+2D").map{it.value}.toList())
// => [A, -3M, +2D]

See another Kotlin demo and the regex demo.

Details

  • (?:\G(?!^)[+-]|^(?=A(?:[+-]\d{1,2}[YMD])*$)) - either the end of the previous successful match and then + or - (see \G(?!^)[+-]) or (|) start of string that is followed with A and then 0 or more occurrences of +/-, 1 or 2 digits and then Y, M or D till the end of the string (see ^(?=A(?:[+-]\d{1,2}[YMD])*$))
  • [^-+]+ - 1 or more chars other than - and +. We need not be too careful here since the lookahead did the heavy lifting at the start of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • it works fine. Thanks. Could you please also add an explanation of ?: in the regex and ?= in the split – nantitv Apr 06 '20 at 11:48