1

I have this string pattern I'm trying to match. It can contain some words enclosed within two curly braces. These curly braces have to be well-formed and they cannot be nested.

For example:

I have this {{example}} which is right
Here {{is}} another {{example}} which is right
{{ This {{ example }} is wrong }}
{{ This }}example {{ is also { wrong }}

I have constructed this regex which seems to work but it is very long and I think it can be simplified.

"^([^{}]*\\{\\{([^{}]*)\\}\\}[^{}]*)+$"

Is there any way I can simplify this?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ogen
  • 6,499
  • 7
  • 58
  • 124
  • What simplification do you seek? Remove the ``\`` from inside the character classes *and* make all lazy quantifiers greedy since they act the same here (see http://ideone.com/kJeHM6). – Wiktor Stribiżew Aug 01 '16 at 13:17
  • @WiktorStribiżew Whoops, I fixed that. I meant simplifications besides that haha – Ogen Aug 01 '16 at 13:18
  • I updated the comment above, you can also change all `*?` to `*`. – Wiktor Stribiżew Aug 01 '16 at 13:20
  • @WiktorStribiżew Which would make the `*` greedy again. Does this not matter because I have the negative character classes? I'd like to know the reason why – Ogen Aug 01 '16 at 13:22
  • Compare https://regex101.com/r/kQ3xP9/1 and https://regex101.com/r/kQ3xP9/2. It depends on what you need to do with the captures. The greedy quantifiers work faster, and since I believe you need no captured values here, you do not need greedy quantifiers. – Wiktor Stribiżew Aug 01 '16 at 13:25
  • Are you trying to match the string or find the contents of the braces? If you want to match, you haven't explained enough about what is allowed. Specifically, does it have to have at least one set of curlies and what can go in them and what is allowed outside of the curly braces? – JimmyJames Aug 01 '16 at 13:26
  • @JimmyJames There can be zero to many sets of curlies. And anything can go inside and outside the curlies. You just cant have nested curlies. I believe my examples are sufficient – Ogen Aug 01 '16 at 13:27
  • @WiktorStribiżew Ok I changed it, is that it? So it cant be made more simple? You should post an answer so I can accept it – Ogen Aug 01 '16 at 13:38
  • I have {} this {{ example }} which is ??? – JimmyJames Aug 01 '16 at 13:46
  • @JimmyJames That is fine I think. So is `this is {{{my}}} example`. – Ogen Aug 01 '16 at 13:48
  • By fine do you mean it should match? – JimmyJames Aug 01 '16 at 13:49
  • Another: I have this { {example} } which is ??? – JimmyJames Aug 01 '16 at 13:54

2 Answers2

2

It would be easier to do a negative match. That is, look for what is not allowed:

\\{\\{[^}]*\\{

and use Matcher.find(). Trying to do this with a positive match is difficult for the same reason using regex against XML is problematic.

If you want to grab the contents of the braces this should suffice

\\{\\{(\\w+\\)}\\}
Community
  • 1
  • 1
JimmyJames
  • 1,356
  • 1
  • 12
  • 24
1

It is not clear what you mean by "make a pattern more simple". Good patterns are known to be long. A shorter regex is rarely precise.

In your case, you should not use escaping with { and } symbols inside the character classes (you fixed that after my first comment), and you do not need lazy quantifiers inside the expression unless you are using captured values for some further processing. However, you do not seem to use them, so use:

"^([^{}]*\\{\\{([^{}]*)\\}\\}[^{}]*)+$"

See Java demo:

String s = "I have this {{example}} which is right";
Pattern pattern = Pattern.compile("^([^{}]*\\{\\{([^{}]*)\\}\\}[^{}]*)+$");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()){
    System.out.println(matcher.group(0)); 
} 
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563