9

TCL: Can Anyone Explain ?: in regular expression

I am getting confusion between ? and ?: .

? means preceding character may or may not be present.

Then I am not understanding what (?:) indicates.

Can Anyone please Explain this.

([0-9]+(?:\.[0-9]*)?)
raina77ow
  • 103,633
  • 15
  • 192
  • 229
user2742564
  • 121
  • 1
  • 1
  • 4
  • 1
    `?` has lots of special uses in regular expressions, its meaning depends on what's to the left of it. These meanings are often unrelated to each other. – Barmar Sep 14 '13 at 08:49
  • 2
    Well, `(?` has a lot of meanings depending on the next 1-2 characters. – chx Sep 14 '13 at 08:51
  • @chx : what are the possible characters to change the meaning else than : in (?:) – user2742564 Sep 14 '13 at 08:53
  • 2
    I don't think I can possibly list all of them, look at http://www.regular-expressions.info/refadv.html for a ton of possibilities: modifiers for subexpressions, lookaheads, conditionals, then subroutines as http://stackoverflow.com/questions/4941259/pcre-regular-expressions-using-named-pattern-subroutines described here. – chx Sep 14 '13 at 08:55

3 Answers3

27

Suppose, you were trying to look for something like ABC123 or ABC123.45 in your input String and you wanted to capture the letters and the numbers separately. You would use a regex (a bit similar to yours) like

([A-Z]+)([0-9]+(\.[0-9]+)?)

The above regex would match ABC123.45 and provide three groups as well that represent sub-parts of the whole match and are decided by where you put those () brackets. So, given our regex (without using ?:) we got

Group 1 = ABC
Group 2 = 123.45
Group 3 = .45

Now, it may not make much sense to capture the decimal portion always and it actually has already been captured in our Group 2 as well. So, how would you make that group () non capturing? Yes, by using ?: at the start as

([A-Z]+)([0-9]+(?:\.[0-9]+)?)

Now, you only get the two desired groups

Group 1 = ABC
Group 2 = 123.45

Notice, I also changed the last part of the regex from \.[0-9]* to \.[0-9]+. This would prevent a match on 123. i.e. numbers without a decimal part but still having a dot.

Ravi K Thapliyal
  • 51,095
  • 9
  • 76
  • 89
3

?: just doesn't create a capturing group. For example a(?:b) will match the "ab" in "abc"

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
2

As mentioned in the re_syntax manual page from the Tcl documentation, the ?: within a parenthetical group turns off the capturing of that group. In other words the expression (\d)(\d) would match 2 digits and make each one available in a separate match group. The expression (\d)(?:\d) is similar but does not provide the matches in separate match groups. Specifically for tcl:

regexp {(\d)(\d)} $data -> first second

will make the first digit and second digits available in the named variables. The corresponding non-collecting regular expression will no provide 3 results but only 1 for the single match from the target. So your expression has 2 outputs one for everything matched and one for the outermost parentheses. The inner parentheses make a regexp group but avoid producing another matching output. So you have something that matches a decimal (3.1415, 0., 10)

patthoyts
  • 32,320
  • 3
  • 62
  • 93