0

This regular expression (the extra \ for each \ is because it is in java code

\\*\\{[\\w\\W]*\\}\\*

The actual would be

\*\{[\w\W]*\}\*

but this expression is not greedy on the }* side. I am trying to match everything between the { and the } so if I have

*{ some comment }* hi there and some stuff *{ comment 2 }* and soe more stuff

should end up with

hi there and some stuff and soe more stuff

but instead it is not greedy enough. There is info on greedy here and I thought I want X1 where that would be

\\*\\{[\\w\\W]*\\}1\\* or \\*\\{[\\w\\W]*\\}{1}\\*

but that doesn't work. How to use their X{n} thing to force greedy here in this example?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Dean Hiller
  • 19,235
  • 25
  • 129
  • 212
  • hmmm, I think the \w\W is screwing it up since that would also match the }* but I really want to math * or } in the middle up until the next }* ...still not sure how to do this one. – Dean Hiller Aug 01 '13 at 16:10

2 Answers2

4

ReplaceAll with your regex but add ? so the [\w\W] will not greedy, like this:

String yourString = "*{ some comment }* hi there and some stuff *{ comment 2 }* and soe more stuff";
yourString.replaceAll("\\*\\{[\\w\\W]*?\\}\\*","");

then you will get that hi there and some stuff and soe more stuff

Angga
  • 2,305
  • 1
  • 17
  • 21
  • 1
    that worked great!!!!.....how come the ? is after the \\w\\W and before the }. Is it that ? really switch the thing after it to greedy? I had put a ? after the } which didn't work and you put it before...(this worked great and now I am just trying to understand why) – Dean Hiller Aug 01 '13 at 16:31
  • that `?` is not greedy approach. instead the `*` is greedy. `*?` means lazy approach zero or more occurency. – Angga Aug 01 '13 at 17:09
  • oh, yes, I see the X*? on http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html as a "reluctant qualifier". This means take as little matches as possible to match the next sequence then??? If so, I think I understand then. – Dean Hiller Aug 02 '13 at 20:43
2

Try something like this:

\*\{((?!\}\*).)*\}\*

Or in Java form:

\\*\\{((?!\\}\\*).)*\\}\\*

It uses a negative lookahead to distinguish the }* closing tag from } alone. That's the ((?!\}\*).)* part.

Edit: Here's a (Java) version that allows newlines. You can also use Pattern.DOTALL to make . include newlines, so the above patterns will work.

\\*\\{((?!\\}\\*)[\\s\\S])*\\}\\*

Note that this will not be recursive. You can't have something like *{ foo *{ bar }* }* and have the whole thing treated as a comment. That would make this a context-free grammar, and trying to parse CFGs is among the most famous no-nos with regex.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • 1
    I suppose you might be able to pull something sneaky with recursive regex (assuming java supports it), but agreed. This was roughly what I was going to be suggesting. – zebediah49 Aug 01 '13 at 16:24
  • hmmm, didn't work...I do have newlines in the actual code....is this going to work with newlines? (My first version had this issue in java which is why I didn't have .* – Dean Hiller Aug 01 '13 at 16:28
  • @zebediah49 - Yup, recursive regex would probably work. I didn't mention that because I'm not sure either whether Java supports them, and because I'd still consider that a no-no here based on the level of pain and effort. – Justin Morgan - On strike Aug 01 '13 at 16:30
  • the other answer seems to work because I think it accounts for newlines – Dean Hiller Aug 01 '13 at 16:32
  • @DeanHiller - Did you see the edited version? It allows newlines. You can also use `Pattern.DOTALL` for that. – Justin Morgan - On strike Aug 01 '13 at 16:36