I have a regex statement that looks like this:
(.*)_(ce)_(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?
It's supposed to group (anything)_(ce)_(anything)_(some digits).(some_ext).(some_possible_ext)
.
So, this is a possible passing string:
hello_ce_world_20192212.json.xml
.
The groups are:
1. hello
2. ce
3. world
4. 20192212
5. json
6. xml
I now want to make the (ce) optional, and make the regex look like this:
(.*)_(ce_)?(.*)_([0-9]{8}).([A-Za-z]{1,20})(?:\\.[A-Za-z]{1,20})?
Such that this would pass: hello_ce_world_20192212.json.xml
, and the groups would be:
1. hello
2. ce
3. world
4. 20192212
5. json
6. xml
And this would pass: hello_world_20192212.json.xml
, and the groups would be:
1. hello
3. world
4. 20192212
5. json
6. xml
So, the regex works! The problem is, when (ce_) is present in the text being evaluated, it is included in group one. So, hello_ce_world_20192212.json.xml
passes the regex, but the groups are:
1. hello_ce
3. world
4. 20192212
5. json
6. xml
This violate the constraint I mentioned above. Not sure how to fix the regex to have it do this; I suspect because it is in between two (.*)
groups, it won't work OR my regex needs to be more specific. Just thinking about it logically makes me understand that it's unlikely I can achieve what I want... but maybe someone out there has more understanding. Any help?
I have found this website helpful for testing out what groups are where and stuff.