2

I have a regex statement that looks like this:

(.*)_(ce)_(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?

It's supposed to group (anything)_(ce)_(anything)_(some digits).(some_ext).(some_possible_ext).

So, this is a possible passing string:

hello_ce_world_20192212.json.xml.

The groups are:

1. hello
2. ce
3. world
4. 20192212
5. json
6. xml

I now want to make the (ce) optional, and make the regex look like this:

(.*)_(ce_)?(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?

Such that this would pass: hello_ce_world_20192212.json.xml, and the groups would be:

1. hello
2. ce
3. world
4. 20192212
5. json
6. xml

And this would pass: hello_world_20192212.json.xml, and the groups would be:

1. hello
3. world
4. 20192212
5. json
6. xml

So, the regex works! The problem is, when (ce_) is present in the text being evaluated, it is included in group one. So, hello_ce_world_20192212.json.xml passes the regex, but the groups are:

1. hello_ce
3. world
4. 20192212
5. json
6. xml

This violate the constraint I mentioned above. Not sure how to fix the regex to have it do this; I suspect because it is in between two (.*) groups, it won't work OR my regex needs to be more specific. Just thinking about it logically makes me understand that it's unlikely I can achieve what I want... but maybe someone out there has more understanding. Any help?

I have found this website helpful for testing out what groups are where and stuff.

John Lexus
  • 3,576
  • 3
  • 15
  • 33

1 Answers1

4

You can make the first group capture a non-greedy one with the ?. This regex should do what you need:

(.*?)_(ce)?_?(.*)_([0-9]{8})\.([A-Za-z]{1,20})?\.([A-Za-z]{1,20})?

as tested in https://regex101.com/r/MZqDPd/3

Also note the adjustments to make ce optional yet captured, without the _. This opens up to cases where either might be missing and still pass the regex. Be aware of this.

sal
  • 3,515
  • 1
  • 10
  • 21