4

I've a huge expression like:

x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52] + FUNC1(z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51] + FUNC0(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49)) + FUNC2(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49), a49, x49) + y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51] + FUNC1(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49) + w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC1(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53]) + RET(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53], x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + movr[54] + m1[54]) + RET(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49) + w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC1(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53]) + RET(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) ...

I need to simplify this expression by replacing some part of repeating expression with other variables. For example:

a = RET(v49, t49, z49) 
b= w49 + h49 + FUNC1(v49) + a + movr[50] + m1[50] 
and so on...

my problem is; this is really huge expression (like 2MB long expression) and doing this manually is near impossible and also without mistakes.

now my question is; is there any app that'll do such thing? or any python program to do so?

I can program python easily, but I lack of such algorithm knowing.

any help appreciated.

HabibS
  • 382
  • 6
  • 14
  • It's a bit of coding, but I would probably write a datatype by overloading the operators and function, which creates a symbolic graph. Then you can recognize common expressions in that graph , before dumping it again. If you wouldn't have functions in there, my `saltype` lib would do exactly what you need. Still, feel free to browse the code for ideas: https://saltype.readthedocs.io/en/latest/api.html#simplify. Maybe `casadi` can help you, as it somehow handles functions: https://web.casadi.org/ – Dr. V Jul 29 '23 at 13:24
  • @Dr.V I'll give it a try. – HabibS Jul 29 '23 at 13:33
  • It sounds like you're trying to identify repeated substrings in a larger string, in which case something like https://stackoverflow.com/questions/37499968 might be useful to read. – Mike 'Pomax' Kamermans Jul 29 '23 at 15:20

1 Answers1

3

The following function extracts all function calls and puts them into variables.

def simplify(progstr, variable_prefix='x'):
    progstr = f' {progstr} '
    prog = []
    while progstr.count('(') > 0:
        for i, c in enumerate(progstr):
            if c == ')':
                c2, i2 = None, i
                while c2 != '(':
                    i2 -= 1
                    c2 = progstr[i2]
                i2 -= 1
                c2 = progstr[i2]
                while c2 not in [',', ' ', '(', ')']:
                    i2 -= 1
                    c2 = progstr[i2]
                variable = progstr[i2+1:i+1]
                vname = f'{variable_prefix}{str(len(prog))}'
                progstr = progstr.replace(variable, vname)
                prog.append(f'{vname} = {variable}')
                break
    prog.append(progstr[1:-1])
    return '\n'.join(prog)

expression = 'x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]'

print(simplify(expression, 'x'))

prints

x0 = FUNC1(v49)
x1 = RET(v49, t49, z49)
x2 = FUNC1(w49 + h49 + x0 + x1 + movr[50] + m1[50])
x3 = RET(w49 + h49 + x0 + x1 + movr[50] + m1[50], v49, t49)
x4 = FUNC1(y49 + z49 + x2 + x3 + movr[51] + m1[51])
x5 = RET(y49 + z49 + x2 + x3 + movr[51] + m1[51], w49 + h49 + x0 + x1 + movr[50] + m1[50], v49)
t49 + x4 + x5 + movr[52] + m1[52]

Next to making the code more readable, this allows avoiding much repeated computation, which should speed it up a lot (especially if the individual function calls are costly), e.g. here FUNC1(v49) gets executed only once as opposed to 5 times.

(Edit): How it works:

While there are parentheses in the expression, do the following: Go through the expression from left to right, until you encounter a closing bracket (call this location j), then walk to the left until you encounter an opening bracket, then walk to the left until you encounter a whitespace, comma or bracket (and call this location i). The segment expression[i:j] then marks the first function call. Then simply replace each occurrence expression[i:j] in expression by a variable name x and add x = expression[i:j] to your list of variables.

Some remarks on the code:

  • it is only briefly tested on some valid and well-formatted code, e.g. it assumes a space after each comma, a space around each operator (+, *, ..., no space), no space before a closing bracket, no space after an opening bracket, ...),
  • it also assumes that the variable prefix does not collide with the variable and function names in the expression (can be made more robust either by knowing about the names and providing some known-to-be safe prefix or by adjusting the code to itself select a prefix that is valid),
  • it further assumes that f(*args) (for a function f and some constant arguments args) always returns the same result, no matter what was computed beforehand, and
  • it also creates variables for sub-expressions with only a single occurrence, which may not be desirable (but this should easily be changeable too, by some small adaptations of the code).
Michael Hodel
  • 2,845
  • 1
  • 5
  • 10