How to create a function to expand formulas?

Question

I am trying to convert formulas from Excel to Python.

For example, AVERAGE(D1:D4) can work in excel but not in Python. For python it would be np.mean([D1,D2,D3,D4]). May I know how to create a function that can convert from AVERAGE(D1:D4) to np.mean([D1,D2,D3,D4]) and work in all scenario.

What do you mean by "all scenarios"? For that you would need to give way more examples of inputs and expected outcomes. — Christian Baumann, Sep 30 '20 at 08:55
@ChristianBaumann all scenario basically the formula is longer with other formula. For example, AVERAGE(D1:D4)+2 -> np.mean([D1,D2,D3,D4])+2 | AVERAGE(D1:D4)+2+AVERAGE(D1:D2) -> np.mean([D1,D2,D3,D4)]+2+np.mean([D1,D2]) — Aaron Teo, Sep 30 '20 at 09:03
For that you'd need to write an Excel formula interpreter in Python. As far as I know, no such thing exists. — L3viathan, Sep 30 '20 at 09:03
How I want to execute is basically finding the AVERAGE formula in the string. Followed by getting the inner bracket value and split by ':'. After that I find the range of value and expand the range(D1:D3 -> D1,D2,D3). The problem is idk how to execute. After that just replace('Average','np.mean') — Aaron Teo, Sep 30 '20 at 09:09

score 0 · Answer 1 · answered Sep 30 '20 at 09:17

Stop thinking in Excel and start thinking Python lists / arrays:

import numpy as np

d = [4, 8, 12, 18]
print("all elements", np.mean(d))
print("middle two", np.mean(d[1:3]))
print("last two", np.mean(d[-2:]))
print("all after 1st", np.mean(d[1:]))
# etc...
# output:
all elements 10.5
middle two 10.0
last two 15.0
all after 1st 12.666666666666666

Rocco Fortuna · Answer 2 · 2020-09-30T09:48:40.780

Given AVERAGE(D1:D4) you can evaluate the argument of the Excel command D1:D4 with eval() in Python. Here is an example of how you could parse AVERAGE(D1:D4)

import numpy as np
import re

excel_command = "AVERAGE(D1:D4)"

D0, D1, D2, D3, D4, D5 = [i for i in range(2, 8)]

def excel_to_py(excel_command):
    if excel_command[:7] == 'AVERAGE':
        command = excel_command[7:][1:-1] # strip command of 'AVERAGE' and parenthesis
        # handle range (e.g. D1:D4)
        range_command = command.split(':')
        # determine the column id, in example it's 'D'
        column = range_command[0][0]
        if len(range_command) == 2: # if splitting by ':' gave two elements, it's a range
            # get range: e.g. for D245:D366 we need only numerical values 245 and 366 for rows
            complete_range = list(range(int(range_command[0][1:]), int(range_command[1][1:]) + 1))
            print(complete_range)
            argument = ', '.join([column+str(n) for n in complete_range])
            print(argument)
            average = np.mean(eval(argument)) # evaluating the
    return average


average = excel_to_py(excel_command)
print(average)

Output is:

[1, 2, 3, 4]
D1, D2, D3, D4
4.5

where 4.5 is equal to np.mean(D1, D2, D3, D4) = (3 + 4 + 5 + 6)/4 = 4.5

Of course this is a basic example, where:

the range (D1:D4) is only applied over a single column (D).
- you could expand it with ranges over columns as well, with help from this post: range over character in python
only the range is handled, and not simply multiple elements (AVERAGE(D1, D3, D4))
- you could easily implement that in a similar way to what I provided above.
other commands are not handled (summing, subtracting, etc...).
- again, follow my example above and have fun implementing those too if needed.
only a single AVERAGE operation is handled, instead of an entire, more complex, expression of multiple different operations.
- it's simply a matter of parsing the input command and handling the operations in the right order.

Good luck!

How to create a function to expand formulas?

2 Answers2