-5

I wanted to catch comment on code (everything from "--" to the end of the line) using regular expressions in TCL.

So I tried {\\-\\-.*$} that should be - then - then any number of any characters and then end of the line. But it doesn't work!

Another post here suggested using .*? instead of .*.

So I tried {\\-\\-.*?$} and that works.

Just wanted to understand the difference between the two. According to any regular expression tutorial/man I read the ? condition should be a subset of *, so I am wondering what's going on there.

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • 2
    Could you please share a test case? Actually, `\-\-.*$` should produce the same result as `\-\-.*?$` – Wiktor Stribiżew May 28 '18 at 12:56
  • i think the title contains a typo `.*` or `.*?` the difference is greedy vs. lazy quantifiers, the first tries first to repeat then bactracks to following pattern, the second tries first the following pattern then backtracks to repeat – Nahuel Fouilleul May 28 '18 at 12:58
  • 3
    "*it doesn't work*" is not a problem description. – melpomene May 28 '18 at 13:07
  • @user, you'll want to read [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) – glenn jackman May 28 '18 at 16:30
  • Ok my fault, i was using \-\-[.]*$ and not \-\-.*$ wasn't aware taht the square brakets escape the . and make iterpreted as "the actual dot character" and not as "any char" – user9858829 May 28 '18 at 18:53
  • Here is a test case : – user9858829 May 28 '18 at 18:54
  • set tstCases { {---------------} {--Un commentaire} { toto<='1'; --un commentaire en ligne} { toto<=A - B; --un commentaire en ligne avec un piege} { toto<=A - B; --.} } set re1 {\-\-.*?$} set re2 {\-\-.*$} set re3 {\-\-[.]*$} foreach case $tstCases { puts $case puts "re1catch :[regexp -all -inline $re1 $case]" puts "re2catch :[regexp -all -inline $re2 $case]" puts "re2catch :[regexp -all -inline $re3 $case]" } – user9858829 May 28 '18 at 18:54
  • So yes they both effectively produce the same which is what i also expected. Makes me learn about the lazy search thing by the way so good to know. Thanks for all your feedback. – user9858829 May 28 '18 at 18:56
  • @melpomene the question is not about the exemple, i just say i happened to end up on a case where it indeed make a difference just to give a context. My question is about understanding the differences it makes in the general case. – user9858829 May 25 '23 at 12:05

2 Answers2

1

"?" makes de previous quantifier lazy, making it match as few characters as posible.

Peter
  • 178
  • 1
  • 9
0

This is documented in the re_syntax man page. The question mark indicates the match should be non-greedy.

Let's look at an example:

% set string "-1234--ab-c-"
-1234--ab-c-
% regexp -inline -- {--.*-} $string
--ab-c-
% regexp -inline -- {--.*?-} $string
--ab-

The 1st match is greedy, matching to the last dash following the double dash.
The 2nd match is not greedy, only matching to the first dash following the double dash.

Note that the Tcl regex engine has a quirk: the first quantifier's greediness sets the greediness of the whole regex. This is documented (IMO obscurely) in the MATCHING section:

... A branch has the same preference as the first quantified atom in it which has a preference.

Let's try to match all the digits, the double dash, see how the non-greedy quantifiers work:

% regexp -inline -- {\d+--.*-} $string
1234--ab-c-
% regexp -inline -- {\d+--.*?-} $string
1234--ab-c-

Oops, the whole match is greedy, even though we asked for some non-greediness. To satisfy this criteria, either we need to make the first quantifier non-greedy as well:

% regexp -inline -- {\d+?--.*?-} $string
1234--ab-

or make all the quantifiers greedy and use a negated bracket expression:

% regexp -inline -- {\d+--[^-]*-} $string
1234--ab-
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Curious voting pattern (currently 2 upvotes, 2 downvotes). Downvoters, care to comment? – glenn jackman May 28 '18 at 14:50
  • I haven't voted, but I would suspect the downvotes are because you haven't actually addressed OP's case. Even if `?` does make the regex lazy, that still doesn't explain why `\\-\\-.*?$` produces a match and `\\-\\-.*$` doesn't, and your examples don't illuminate this either. As far as I can tell, both should be equivalent in behavior. – JLRishe May 28 '18 at 16:03
  • I suppose. I agree they should act the same. We're missing any feedback from the OP though. – glenn jackman May 28 '18 at 16:33
  • well i do agree with you in my mind also it should produce the same, but it don't. My question is no so much on the particular exemple, i have one pattern that works so that project has lived it's life no problem. I just try to figure out the why because that makes me curious. – user9858829 May 25 '23 at 12:11
  • It would be helpful if you could provide an example string where the lazy regex matches but the greedy one does not. – glenn jackman May 25 '23 at 16:31