Context
I am trying to cache executions in a data processing framework (kedro). For this, I want to develop a unique hash for a python function to determine if anything in the function body (or the functions and modules this function calls) has changed. I looked into __code__.co_code
. While that nicely ignores comments, spacing etc, it also doesn't change when two functions are obviously different. E.g.
def a():
a = 1
return a
def b():
b = 2
return b
assert a.__code__.co_code != b.__code__.co_code
fails. So the byte code for these two functions is equal.
The ultimate goal: Determine if either a function's code or any of its data inputs have changed. If not and the result already exists, skip execution to save runtime.
Question: How can one get a fingerprint of a functions code in python?
Another idea brought forward by a colleague was this:
import dis
def compare_instructions(func1, func2):
"""compatre instructions of two functions"""
func1_instructions = list(dis.get_instructions(func1))
func2_instructions = list(dis.get_instructions(func2))
# compare every attribute of instructions except for starts_line
for line1, line2 in zip(func1_instructions, func2_instructions):
assert line1.opname == line2.opname
assert line1.opcode == line2.opcode
assert line1.arg == line2.arg
assert line1.argval == line2.argval
assert line1.argrepr == line2.argrepr
assert line1.offset == line2.offset
return True
This seems rather like a hack. Other tools like pytest-testmon try to solve this as well but they appear to be using a number of heuristics.