What's the best way to turn these nested strings in prefix notation into tuples?

Question

I would like a general solution to transforming strings that are in prefix notation into tuples. Here are three examples below that illustrate what I hope to achieve. Each string is an element in a list (only the second example has more than one element).

['neg(or(p,q))']
['imp(p,q)', 'imp(neg(r),neg(q))']
['and(and(p,q),r)']

Becomes

[('neg',('or','p','q'))]
[('imp','p','q'), ('imp',('neg','r'),('neg','q'))]
[('and',('and','p','q'),'r')]

More generally, the format I desire is ('connective', 'parameter', 'parameter') where connectives may be 'and', 'or', 'iff', 'imp', 'neg'. Each of these is then followed by its two parameters, except for 'neg' which takes just one. Parameters may be other tuples as shown above in the examples, which demonstrates nested formulae.

Another way to look at this is with the formulae that motivated the problem in the first place.

Ex. 1 ((p ∧ q) ∨ (¬p ∧ ¬q)) ultimately becomes

[('or',('and','p','q'),('and',('neg','p'),('neg','q')))]

Ex. 2 (p → q), (¬r → ¬q) ultimately becomes

[('imp','p','q'), ('imp',('neg','r'),('neg','q'))]

What I have tried:

import re

def tuplify(raw_sequent):

    first_split = re.split('\(', raw_sequent, 1)
    connective = first_split[0]
    second_split = re.split('\,', first_split[1])
    right_arg = second_split[-1]
    second_split = second_split[:-1]
    left_arg = ','.join(second_split)
    tup = (connective, left_arg, right_arg[:-1])

    return tup

However, this does not generalize well at all, only working for the input

'and(and(p,q),or(r))'

I imagine the solution might require a combination of regex and recursion.

I think you need to more clearly lay out your rules of how to parse and convert to tuples — Chris Doyle, Oct 15 '19 at 11:28

Charif DZ · Accepted Answer · 2019-10-15T14:08:10.743

This a solution using Regex and String manipulation Hope the comments explains what is going on:

import re


def convert_to_tuple(text):
    """ Take an expression and convert it to tuple
      'neg(or(p,q))'  -->  ('neg', ('or', 'p', 'q'))
    """
    # used to extract not nested expression
    pattern = re.compile('(\w+)\(([^\(]*?)\)')
    # extract the index of the expression
    index_pattern = re.compile('#(\d+)#')
    # contains all expression extracted from the text
    expressions = []
    # you only need to extract most global expression only
    global_expressions = []


    def fix_expression(expression):
        """  a recursive solution to rebuild nested expression. """
        if isinstance(expression, str):
            # if the expression is like #index# return the correct expression else return this str expression
            m = index_pattern.search(expression)
            if m:
                return tuple(fix_expression(expressions[int(m.group(1))]))
            return expression
        # if the expression is a tuple extract all fixed nested expression in a tuple
        return tuple(fix_expression(subexp) for subexp in expression)


    def extract_expression(code):
        """ keep extracting nested expressions list,  """

        def add_exp(match):
            """ handle the match return by sub to save the expression and return its index."""
            expressions.append(None)
            index = len(expressions) - 1
            name, args = match.groups()

            if ',' in args:
                # if what is between parentheses contains ',' split is
                args = tuple(args.split(','))
            else:
                # args should always be a tuple to support unpacking operation
                args = (args,)

            # expression transformed to a tuple
            expressions[index] = (name, *args)
            return '#{}#'.format(index)

        while pattern.search(code):
            code = re.sub(pattern, add_exp, code)

        # for debuging string after removing all expression
        # print(code)


    # this extract all nested expression in the  text
    extract_expression(text)

    # Global expression there index is not used in any other expression
    # the last expression is for sure a global one because.
    global_expressions.append(expressions[-1])
    for i, exp in enumerate(expressions[:-1]):
        for other_exp in expressions[i + 1:]:
            if any('#{}#'.format(i) in sub_exp for sub_exp in other_exp):
                break
        else:
            # this expression is not used in any expression it's a global one
            global_expressions.append(exp)

    # for debugging
    # print(expressions)
    # print(global_expressions)

    return fix_expression(global_expressions[0])


print([convert_to_tuple(x) for x in ['neg(or(p,q))']])  # [('neg', ('or', 'p', 'q'))]
print([convert_to_tuple(x) for x in ['imp(p,q)', 'imp(neg(r),neg(q))']])  # [('imp', 'p', 'q'), ('imp', ('neg', 'r'), ('neg', 'q'))]
print([convert_to_tuple(x) for x in ['and(and(p,q),r)']])  # [('and', ('and', 'p', 'q'), 'r')]

The trick in nested expression when we use Regex is I use re.sub to extract a not nested expression first to a list and replace it with its index. keep doing that until there is no expression to extract. then handle the list of extracted expression to do the required Job.

check my answer on similar question, to have a clear Idea on this technique:

Answer for Extract all variables from a string of Python code (regex or AST)

score 1 · Answer 2 · answered Oct 15 '19 at 12:45

The following code

import re

OPERATORS_DIC = {'and': 2, 'or': 2, 'iff': 2, 'imp': 2, 'neg': 1}
EXPRESSIONS = [ 'neg(or(p,q))'
                , 'imp(p,q)'
                , 'imp(neg(r),neg(q))'
                , 'and(and(p,q),r)' ]

def digest(expression):
    """expression is an expression that should be recursively digested
    """
    for op in OPERATORS_DIC:
        match = re.search( op + '\((.*)\)', expression )
        if not match:
            continue
        inside = match.groups()[0]
        # examples: inside is 'p,q' or 'p,neg(q)', or 'p,neg(and(q,r))'
        if OPERATORS_DIC[op] == 1:
            return (op, digest(inside))
        # else we try to get the two arguments by counting parentheses
        for k in range(len(inside)):
            if inside[k] != ',':
                continue
            before, after = inside[:k], inside[k+1:]
            if ( before.count('(') == before.count(')') and
                 after .count('(') == after .count(')') ):
                return (op, digest(before), digest(after))
    return expression


for expr in EXPRESSIONS:
    print('{} -> {}'.format(expr, digest(expr)))

gives:

neg(or(p,q)) -> ('neg', ('or', 'p', 'q'))
imp(p,q) -> ('imp', 'p', 'q')
imp(neg(r),neg(q)) -> ('imp', ('neg', 'r'), ('neg', 'q'))
and(and(p,q),r) -> ('and', ('and', 'p', 'q'), 'r')

So the needed output for the input of the EXPRESSIONS is

[ digest(expr) for expr in EXPRESSIONS ]

score 1 · Answer 3 · answered Oct 15 '19 at 16:14

Shorter recursive solution with a generator:

import re
def to_tuple(d, stop=None):
   a = next(d, None)
   if a and a != stop:
      b = next(d, None)
      if b is None or b == stop:
        yield a
      else:
         if b == '(':
            yield (a, *to_tuple(d, stop=')'))
         else:
            yield from filter(None, [a, b])
         yield from to_tuple(d, stop=stop)

def format_input(_d):
   return iter(re.findall('\(|\)|\w+', _d))

n = [['neg(or(p,q))'], ['imp(p,q)', 'imp(neg(r),neg(q))'], ['and(and(p,q),r)']]
r = [[tuple(to_tuple(format_input(i)))[0] for i in k] for k in n]

Output:

[[('neg', ('or', 'p', 'q'))], 
[('imp', 'p', 'q'), ('imp', ('neg', 'r'), ('neg', 'q'))], 
[('and', ('and', 'p', 'q'), 'r')]]

score 0 · Answer 4 · edited Oct 15 '19 at 14:42

0

You problem seems a good fit for a recursive solution:

def tuplify(raw_sequent):
    numOpen = 0
    numClose = 0

    startBracket = 0
    stopBracket = 0
    for i in raw_sequent:
        if i == "(":
            numOpen += 1

        if i == ")":
            numClose += 1

        if numOpen == 0:
            startBracket += 1
            stopBracket += 1
        elif numOpen > numClose:
            stopBracket += 1

    begin = raw_sequent[:startBracket]
    middle = raw_sequent[startBracket+1:stopBracket]
    end = raw_sequent[stopBracket+1:]

    result = []
    if len(begin) > 0:
        result += [i for i in begin.split(",") if len(i) > 0]

    if len(middle) >0:
        result.append(tuplify(middle))

    if len(end) > 0:
        result.append(tuplify(end))

    return tuple(result)

edited Oct 15 '19 at 14:42

Matt Seymour

8,880
7
60
101

answered Oct 15 '19 at 11:25

Tom Nijhof

542
4
11

is this an algorithm ?! – Charif DZ Oct 15 '19 at 11:31
More the general idea. Didn't had time to go in to full details, give me moment – Tom Nijhof Oct 15 '19 at 12:01
This should be a better answer about what is going on – Tom Nijhof Oct 15 '19 at 12:21

What's the best way to turn these nested strings in prefix notation into tuples?

4 Answers4