I have a number of operations I want to "fuse" together. Let's say there are 3 possible operations:
sq = lambda x: x**2
add = lambda x: x+3
mul = lambda x: x*5
I also have an array of operations:
ops = [add, sq, mul, sq]
I can then create a function from these operations:
def generateF(ops):
def inner(x):
for op in ops:
x = op(x)
return x
return inner
f = generateF(ops)
f(3) # returns 32400
fastF = lambda x: (5*(x+3)**2)**2
f
and fastF
does the same thing, but fastF
is around 1.7-2 times faster than f
on my benchmark, which makes sense. My question is, how can I write generateF
function that returns a function that is as fast as fastF
? The operations are restricted to basic operations like __add__
, __mul__
, __matmul__
, __rrshift__
, etc (essentially most numeric operations). generateF
can take as long as you'd like, because it will be done before reaching hot code.
The context is that this is a part of my library, so I can define every legal operation, and thus know exactly what they are. The operation definitions are not given to us by the end user randomly (the user can only pick the order of the operations), so we can utilize every outside knowledge about them.
This might seem like premature optimization, but it is not, as f
is hot code. Dropping to C is not an option, as the operations can be complex (think, PyTorch tensor multiply), and x
can be of any type. Currently, I'm thinking about modifying python's bytecode, but that is very unpleasant, as bytecode specifications changes for every Python version, so I wanted to ask here first before diving into that solution.