0

I want to parse a logical expression like the following:

(f = '1' OR f = '2') AND (s = '3' OR s = '4' OR s = '5') AND (t = '6')

What I need, is a representation of this logical expression in the form of a expression tree. In the end I want to be able to create a JSON representation of this data structure in the form of:

{
    ...
    nodes: [
    {
        id: 1,
        operator: 'AND'
        operands: [
            2,
            3,
            4
        ]
    },
    {
        id: 2,
        operator: 'OR'
        operands: [
            5,
            6
        ]
    },
    {
        id: 3,
        operator: 'OR'
        operands: [
            7,
            8,
            9
        ]
    },
    leafs: [
    {
        id: 4,
        operator: '='
        operands: [
            t,
            6
        ]
    },
    ...
}

I am aware that this is not the best representation of the data structure, but this is just an example. How do I approach this problem using packages like pyparse or re?

quamrana
  • 37,849
  • 12
  • 53
  • 71
Ipsider
  • 553
  • 1
  • 7
  • 20
  • 1
    Are nested brackets possible? – markalex May 24 '23 at 14:38
  • Actually no. The brackets are only present on the first layer. – Ipsider May 24 '23 at 14:44
  • 1
    Then you can use [`\((.+?)\)\s*(OR|AND)?`](https://regex101.com/r/0N5Zu4/1) to extract first layer operations and operands and [`\s*(.+?)\s*=\s*'(.+?)'\s*(OR|AND)?`](https://regex101.com/r/oA7MEF/1) for inner operations. – markalex May 24 '23 at 14:51
  • A trick: You can tokenize the parenthesized part by using `re.split(' (OR|AND) ', expression)`, which results in `["s = '3'", 'OR', "s = '4'", 'OR', "s = '5'"]` for `expression = "s = '3' OR s = '4' OR s = '5'"` – InSync May 24 '23 at 16:27

2 Answers2

1

I assume your example is very basic and the real input will contain nesting. If so, I'd highly recommend using pyparsing module. It's an intermediate level between building your own parser and using regex. Like Ira Baxter mentioned, regex is not the answer since it can't truly handle nesting/recursion.

It has a good learning curve and has an amazing feature called infix_notation that can handle operator precedence, operator associativity and can work with unary/binary/ternary operators. For example, it can properly group expression like a or b and c which in most languages (at least those that I know) is a or (b and c).

Once you build a representation of what you want then you can use set_parse_action which is basically a way to transform the parsed elements when they are parsed. Using that you can make it into whatever output you want.

Here's a great book on how to use pyparsing: https://www.oreilly.com/library/view/getting-started-with/9780596514235/

0

If your boolean expression is always of the form A AND B AND C .... where A B C etc. are always of the form X OR Y OR Z, then you can do this by just hacking: pick out the OR phrases inside the parentheses, extract the and terms, and glue the or phrases under the top level and. You shouldn't need more detail than this.

If your boolean expressions can be arbitrary (e.g., A AND NOT (B OR NOT C AND D), then you'll need to build a parser, because regular expressions can't keep track of nested parentheses. See my highly rated SO answer on how to build a parser easily: https://stackoverflow.com/a/2336769/120163. The method is easily adapted to any programming language including Python.

That answer has a link to another answer that tells you how to build a parse tree from the parsing actions. This is what I think you want as an internal representation. You can walk over the parse tree with a recursive visitor to generate the JSON text you desire.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341