0

I'm trying to split a string into two parts using regex, but apparently, the regex is greedy so in the first group it adds a little bit more.

Example of string: "This is a phrase 22ext"

Desired result:

  • Group 0 = "This is a phrase"

  • Group 1 = "22"

  • The "ex"t iss discarded.

I'm using the following Regex (in java):

[^0-9]*([0-9]+).*

It works for Group 1, but in Group 0, it includes "22ext" as well.

How can I avoid it?

Nisanio
  • 4,056
  • 5
  • 34
  • 46

1 Answers1

2

Your regex doesn't give the desired output because you didn't add the first part of it in a group, so you only have one group in your regex 1. You can fix that by using:

([^0-9]*)([0-9]+).*

And then you can find your two strings in "Group 1" and "Group 2". Note that "Group 0" is the full match.

Here's a demo.

A better and shorter way is to use the following regex:

(\D*)(\d+)

Which matches any non-numeric characters in the first group (until it reaches the first numeric characters) and then it matches the upcoming numeric characters including all Unicode digits in the second group.

And you can decide whether or not to include the .* at the end.

Try it online.


References:


1 "Group 0" is the full match for the entire pattern, so you need to use "Group 1" and "Group 2".

  • Your "solution" regex is equivalent to the author's. Only the group numbering is changed. It would be just as easy to pick the other group, arguably easier. – Obicere May 15 '18 at 03:48
  • @Obicere, That's not true! The OP's regex only has **one group**, so I 1) pointed that out. 2) Provided an alternative solution to the problem described in the question _which I believe is better_. – 41686d6564 stands w. Palestine May 15 '18 at 03:51
  • There is literally no difference between the grouping count. This is the equivalent of padding an array with `null` entries. Properly referencing the group is a better solution. – Obicere May 15 '18 at 03:53
  • @Obicere `There is literally no difference between the grouping count` The OP specifically said that he wants the output split in _two_ groups. I explained why his regex doesn't do that **and** provided an additional solution to the problem. What do you see wrong with that?! I'm not sure what you mean by _"Properly referencing the group is a better solution"_. – 41686d6564 stands w. Palestine May 15 '18 at 03:57