-1

I am trying to convert formulas from Excel to Python.

For example, AVERAGE(D1:D4) can work in excel but not in Python. For python it would be np.mean([D1,D2,D3,D4]). May I know how to create a function that can convert from AVERAGE(D1:D4) to np.mean([D1,D2,D3,D4]) and work in all scenario.

Christian Baumann
  • 3,188
  • 3
  • 20
  • 37
Aaron Teo
  • 23
  • 3
  • 1
    What do you mean by "all scenarios"? For that you would need to give way more examples of inputs and expected outcomes. – Christian Baumann Sep 30 '20 at 08:55
  • @ChristianBaumann all scenario basically the formula is longer with other formula. For example, AVERAGE(D1:D4)+2 -> np.mean([D1,D2,D3,D4])+2 | AVERAGE(D1:D4)+2+AVERAGE(D1:D2) -> np.mean([D1,D2,D3,D4)]+2+np.mean([D1,D2]) – Aaron Teo Sep 30 '20 at 09:03
  • 2
    For that you'd need to write an Excel formula interpreter in Python. As far as I know, no such thing exists. – L3viathan Sep 30 '20 at 09:03
  • How I want to execute is basically finding the AVERAGE formula in the string. Followed by getting the inner bracket value and split by ':'. After that I find the range of value and expand the range(D1:D3 -> D1,D2,D3). The problem is idk how to execute. After that just replace('Average','np.mean') – Aaron Teo Sep 30 '20 at 09:09

2 Answers2

0

Stop thinking in Excel and start thinking Python lists / arrays:

import numpy as np

d = [4, 8, 12, 18]
print("all elements", np.mean(d))
print("middle two", np.mean(d[1:3]))
print("last two", np.mean(d[-2:]))
print("all after 1st", np.mean(d[1:]))
# etc...
# output:
all elements 10.5
middle two 10.0
last two 15.0
all after 1st 12.666666666666666
Cedric Druck
  • 1,032
  • 7
  • 20
0

Given AVERAGE(D1:D4) you can evaluate the argument of the Excel command D1:D4 with eval() in Python. Here is an example of how you could parse AVERAGE(D1:D4)

import numpy as np
import re

excel_command = "AVERAGE(D1:D4)"

D0, D1, D2, D3, D4, D5 = [i for i in range(2, 8)]

def excel_to_py(excel_command):
    if excel_command[:7] == 'AVERAGE':
        command = excel_command[7:][1:-1] # strip command of 'AVERAGE' and parenthesis
        # handle range (e.g. D1:D4)
        range_command = command.split(':')
        # determine the column id, in example it's 'D'
        column = range_command[0][0]
        if len(range_command) == 2: # if splitting by ':' gave two elements, it's a range
            # get range: e.g. for D245:D366 we need only numerical values 245 and 366 for rows
            complete_range = list(range(int(range_command[0][1:]), int(range_command[1][1:]) + 1))
            print(complete_range)
            argument = ', '.join([column+str(n) for n in complete_range])
            print(argument)
            average = np.mean(eval(argument)) # evaluating the
    return average


average = excel_to_py(excel_command)
print(average)

Output is:

[1, 2, 3, 4]
D1, D2, D3, D4
4.5

where 4.5 is equal to np.mean(D1, D2, D3, D4) = (3 + 4 + 5 + 6)/4 = 4.5

Of course this is a basic example, where:

  • the range (D1:D4) is only applied over a single column (D).
  • only the range is handled, and not simply multiple elements (AVERAGE(D1, D3, D4))
    • you could easily implement that in a similar way to what I provided above.
  • other commands are not handled (summing, subtracting, etc...).
    • again, follow my example above and have fun implementing those too if needed.
  • only a single AVERAGE operation is handled, instead of an entire, more complex, expression of multiple different operations.
    • it's simply a matter of parsing the input command and handling the operations in the right order.

Good luck!

Rocco Fortuna
  • 300
  • 1
  • 11