0

I'm trying to match JSON condition strings in Java and have been hitting my head against a wall trying to find a solution. In these condition strings, conditions are JSON objects "{}", which may be evaluated on their own, or in a group. These groups are represented as arrays of condition objects "[]". I had a solution that worked fine to match the contents between characters for groups and objects respectively, however, this went out the window when I attempted to store groups inside other groups.

Example string:

[{"field":"type","operand":"=","value":"column"}, "&&",
  [{"field":"type","operand":"=","value":"column"}, "||", 
   {"field":"type","operand":"=","value":"column"}], "||", 
   {"field":"type","operand":"=","value":"column"}]

The intended outcome was to match the contents of the string between (and including) the characters [ and ].

I've worked out that what I need is to match the contents of [ ... ] until there is a ] that is not followed (somewhere ahead, i.e., .+?) by another ] before a [. Doing a lazy search for ] will stop the match at the first occurence, but doing a greedy match will match the entire string up to the last occurence, which may contain many other groups/objects. I've experimented with lookbehinds/lookaheads, but these require statically defined character positions, and cannot stretch to find (or not find) an occurence of a character.

At this point, I'm stumped and would greatly appreciate any advice you have to offer.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
J055Y
  • 3
  • 2

1 Answers1

0

Technically possible with a single regex, but probably not worth it. Note that the linked answer is for one type of parenthesis only; your program is harder because you have {} and [].

The problem is that nested expressions are not a regular language; regex libraries can sometimes parse non-regular languages with the help of extensions (like forward and backward references), but the results are generally fragile and unreadable. This issue is quite famous on StackOverflow.

Since your input is already a JSON string, you're much better off using a real JSON parser. I promise you it'll be much less painful than what you've already done.


If you really must use regex, I suggest doing it in iteratively:

  1. Create a regex that finds all {} or [] that don't have any {} or [] inside.
  2. Search for matches in your input string.
  3. Replace each match with a unique token, like "TOKEN_N". Remember what was the match string for each of these tokens.
  4. Repeat 2-3 until there are no more matches.

In the end you'll have transformed the input into something like

[TOKEN1, "&&", TOKEN2, "||", TOKEN3]

And a recursive dictionary of everything you replaced:

TOKEN1={"field":"type","operand":"=","value":"column"}
TOKEN2=[TOKEN4, "||", TOKEN5]
TOKEN3={"field":"type","operand":"=","value":"column"}
TOKEN4={"field":"type","operand":"=","value":"column"}
TOKEN5={"field":"type","operand":"=","value":"column"}

This is a stringified Abstract Syntax Tree of your expression, and from here you can process it however you like.

BoppreH
  • 8,014
  • 4
  • 34
  • 71
  • 1
    Thank you very much! In the end I created a custom Json deserialiser to handle the different combination of objects in my array. I can't believe I didn't think of this earlier. – J055Y Apr 26 '23 at 15:41