2

I have the following statement:

column1==1 && 'column2'=='value' || year == 2006 || ('column2'=='value1' && 'column2'!='value2')  

which is pass to me like the JSON below:

{
    "type": "ExpressionStatement",
    "expression": {
        "type": "LogicalExpression",
        "operator": "||",
        "left": {
            "type": "LogicalExpression",
            "operator": "||",
            "left": {
                "type": "LogicalExpression",
                "operator": "&&",
                "left": {
                    "type": "BinaryExpression",
                    "operator": "==",
                    "left": {
                        "type": "Identifier",
                        "name": "column1"
                    },
                    "right": {
                        "type": "Literal",
                        "value": 1,
                        "raw": "1"
                    }
                },
                "right": {
                    "type": "BinaryExpression",
                    "operator": "==",
                    "left": {
                        "type": "Literal",
                        "value": "column2",
                        "raw": "'column2'"
                    },
                    "right": {
                        "type": "Literal",
                        "value": "value",
                        "raw": "'value'"
                    }
                }
            },
            "right": {
                "type": "BinaryExpression",
                "operator": "==",
                "left": {
                    "type": "Identifier",
                    "name": "year"
                },
                "right": {
                    "type": "Literal",
                    "value": 2006,
                    "raw": "2006"
                }
            }
        },
        "right": {
            "type": "LogicalExpression",
            "operator": "&&",
            "left": {
                "type": "BinaryExpression",
                "operator": "==",
                "left": {
                    "type": "Literal",
                    "value": "column2",
                    "raw": "'column2'"
                },
                "right": {
                    "type": "Literal",
                    "value": "value1",
                    "raw": "'value1'"
                }
            },
            "right": {
                "type": "BinaryExpression",
                "operator": "!=",
                "left": {
                    "type": "Literal",
                    "value": "column2",
                    "raw": "'column2'"
                },
                "right": {
                    "type": "Literal",
                    "value": "value2",
                    "raw": "'value2'"
                }
            }
        }
    }
}

I want to extract from this JSON string, only the binary expression operands. For example, for column1==1 I need only this part of the JSON string:

"left": {
    "type": "Identifier",
    "name": "column1"
},
"right": {
    "type": "Literal",
    "value": 1,
    "raw": "1"
}

Having that string I can get the column name and the comparison value. As the JSON format string is consistent, I can use zero-width positive lookbehind assertion and check for "type": "BinaryExpression", value. Then I need to get all data to the second }.

Is there a way to say with regex to get everything to the second occurrence of }?

Note:

  1. I know it will be better to use some JSON to string parser, but I am not allowed (I am working in the context of SQL Server).
  2. I know in SQL Server 2016 there is native JSON support, but some of the database are on earlier version.
  3. I need pure regex solution, as in the context of the T-SQL I am not able to use .net like this one for example.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
gotqn
  • 42,737
  • 46
  • 157
  • 243
  • However, TSQL has no regex support. See [Using RegEx in SQL Server](http://stackoverflow.com/questions/8928378/using-regex-in-sql-server). You may use very limited *wildcard* patterns. I re-tagged the quesiton. – Wiktor Stribiżew May 09 '17 at 08:45
  • @WiktorStribiżew I have implemented several SQL CLR regex functions. I have regex replace, regex matches, regex split, etc. You can check this for more information - https://msdn.microsoft.com/en-us/library/ff878119.aspx – gotqn May 09 '17 at 08:48
  • Ok, but you did not mention it in the question. I put the regex tag back then. So, what have you tried? – Wiktor Stribiżew May 09 '17 at 08:50
  • If you use regexes, I guess Binary expression is always having left/right. Why not using that information? – tafia May 09 '17 at 08:56
  • @WiktorStribiżew I have try something like this `(?i)(?<="BinaryExpression")[^\}]+[^\}]+` but it stops the match to the first occurrence of `}` – gotqn May 09 '17 at 09:03
  • And how can you be sure you do not have `}` inside a string literal? You will have to match the string literals, too. And you may use capturing to *capture* the part of the pattern, to get rid of the lookbehind. – Wiktor Stribiżew May 09 '17 at 09:03
  • Yes, you are right. But let's say I replace all `}` symbols that are encapsulated with double quotes with some special character. Then, I can look only for second occurrence of `}`. – gotqn May 09 '17 at 09:05
  • This seems to work `(?i)(?<="BinaryExpression")[^\}]+.[^\}]+`. – gotqn May 09 '17 at 09:11
  • 1
    Cool, then use `(?i)(?<="BinaryExpression")[^}]+}[^}]+` if it works for you. – Wiktor Stribiżew May 09 '17 at 09:25

1 Answers1

1
\"type\": \"BinaryExpression\",\s*(\".+\": +\"?.+\"?,?\s*)*(\".+\": \{\s*(\".+\": +\"?.+\"?,?\s*)+\},?\s*)+

Seems to work okay. Explanation:

\"type\": \"BinaryExpression\",\s* Find text `"type": Binary Expression" followed by some number of whitespaces

(\".+\": +\"?.+\"?,?\s*)* Find string of format "something" : "something" with an optional , at the end. As there might be any number of those after Binary Expression, we add * at the end. Also, as some attributes do not have "" around them, the second qutation marks are optional. Once again followed by some whitespaces

(\".+\": {\s* string of format `"Something: {" whitespaces

(\".+\": +\"?.+\"?,?\s*)+ content of format "something": "else" where second quotation marks are optional. As there are many lines of this type, we follow this with +

},?\s*)+ those lines have to end with }, then an optional coma.

Try it here: https://regex101.com/r/IAgV0e/1

Xyzk
  • 1,332
  • 2
  • 21
  • 36