13

I am working on building formula reference map from spreadsheet xml using python. formula is like

=IF(AND(LEN(R[-2]C[-1])>0,R[-1]C),WriteCurve(OFFSET(R16C6, 0,0,R9C7,R10C7),R15C6,R10C3, R8C3),"NONE")

I am only interested in getting nth argument of writecurve function. here i comes up very C style program basically counting coma which is not inside bracket. there are a lot of nested formula

def parseArguments(t, func, n):
start=t.find(func)+len(func)+1
bracket = 0
ss = t[start:]
lastcomma = 0
for i, a in enumerate(ss):
    if a=="(":
        bracket +=1
    elif a==")":
        if bracket==0:
            break
        bracket-=1
    elif a == ",":
        if bracket==0 and n==0:
            break
        elif bracket ==0:
            if n-1==0:
                lastcomma = i
            n-=1
if lastcomma == 0:
    return ss[:i]
else:
    return ss[lastcomma+1:i]

Is there pythonic way of doing this? or is there a better recursive way to parse the whole formula? Many thanks

archlight
  • 687
  • 1
  • 6
  • 12

1 Answers1

17

The best Excel formula parser I'm aware of is E. W. Bachtal's algorithm. There's a Python port by Robin Macharg; the most recent version I know about is part of the pycel project, but it can be used standalone - ExcelFormula. It has no problem parsing your formula:

from pycel.excelformula import ExcelFormula, FunctionNode
formula = ExcelFormula('=IF(AND(LEN(R[-2]C[-1])>0,R[-1]C),WriteCurve(OFFSET(R16C6, 0,0,R9C7,R10C7),R15C6,R10C3, R8C3),"NONE")')
>>> formula.ast
FunctionNode<IF>

You can then walk the AST to find the WriteCurve node and examine its arguments:

write_curve = next(node for node, _ in formula.ast.descendants if isinstance(node, FunctionNode) and node.value.strip('(') == 'WriteCurve')
>>> write_curve.children[2].value
'R10C3'
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • Visit https://openpyxl.readthedocs.io/en/stable/formula.html?highlight=token#parsing-formulas for something that works in 2020 – Motin Dec 07 '20 at 08:40
  • @Motin that only does basic tokenizing, you still need pycel to shunting-yard it into RPN and then AST. I'll look at the current API there. – ecatmur Dec 07 '20 at 09:42
  • @Motin updated to current pycel API. btw thanks for pointing out the need for an update! – ecatmur Dec 07 '20 at 09:52