37

This is an example string:

123456#p654321

Currently, I am using this match to capture 123456 and 654321 in to two different groups:

([0-9].*)#p([0-9].*)

But on occasions, the #p654321 part of the string will not be there, so I will only want to capture the first group. I tried to make the second group "optional" by appending ? to it, which works, but only as long as there is a #p at the end of the remaining string.

What would be the best way to solve this problem?

user1447941
  • 3,675
  • 10
  • 29
  • 34

2 Answers2

64

You have the #p outside of the capturing group, which makes it a required piece of the result. You are also using the dot character (.) improperly. Dot (in most reg-ex variants) will match any character. Change it to:

([0-9]*)(?:#p([0-9]*))?

The (?:) syntax is how you get a non-capturing group. We then capture just the digits that you're interested in. Finally, we make the whole thing optional.

Also, most reg-ex variants have a \d character class for digits. So you could simplify even further:

(\d*)(?:#p(\d*))?

As another person has pointed out, the * operator could potentially match zero digits. To prevent this, use the + operator instead:

(\d+)(?:#p(\d+))?
Jonah Bishop
  • 12,279
  • 6
  • 49
  • 74
  • I tried that previously, and it works however if there is a second part to the string then the first group is the whole string, and nothing is in the second group. – user1447941 Sep 17 '12 at 00:06
  • 1
    The dot in your reg-ex is causing your problem. See my revision. – Jonah Bishop Sep 17 '12 at 00:07
  • And now the second group is `#p654321`. It's visible that it's a part of the group match. – user1447941 Sep 17 '12 at 00:10
  • Ah, so you just want the digits? I misunderstood your goal. – Jonah Bishop Sep 17 '12 at 00:11
  • Thanks you sort out my big issues – Manu Nair Jan 22 '15 at 13:47
  • 1
    I think it should be mentioned that a capture group after the optional group will keep its reference even if the optional group is not there (as in `\2` will be empty and a fictional third capture group would still be referenced as `\3`, not as `\2`). Just if anybody wonders. – BUFU Nov 11 '20 at 11:18
7

Your regex will actually match no digits, because you've used * instead of +.
This is what (I think) you want:

(\d+)(?:#p(\d+))?
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    You are correct; the `+` would be a better operator to use. I thought about making that change, but the OP didn't specify whether or not no-digit scenarios were a possibility. As such, I tried to keep it as close to his original as possible. – Jonah Bishop Sep 17 '12 at 00:28