5

Context

I am trying to cache executions in a data processing framework (kedro). For this, I want to develop a unique hash for a python function to determine if anything in the function body (or the functions and modules this function calls) has changed. I looked into __code__.co_code. While that nicely ignores comments, spacing etc, it also doesn't change when two functions are obviously different. E.g.

def a():
  a = 1
  return a

def b():
  b = 2
  return b

assert a.__code__.co_code != b.__code__.co_code

fails. So the byte code for these two functions is equal.

The ultimate goal: Determine if either a function's code or any of its data inputs have changed. If not and the result already exists, skip execution to save runtime.

Question: How can one get a fingerprint of a functions code in python?

Another idea brought forward by a colleague was this:

import dis

   def compare_instructions(func1, func2):
       """compatre instructions of two functions"""
       func1_instructions = list(dis.get_instructions(func1))
       func2_instructions = list(dis.get_instructions(func2))
 
       # compare every attribute of instructions except for starts_line
       for line1, line2 in zip(func1_instructions, func2_instructions):
           assert line1.opname == line2.opname
           assert line1.opcode == line2.opcode
           assert line1.arg == line2.arg
           assert line1.argval == line2.argval
           assert line1.argrepr == line2.argrepr
           assert line1.offset == line2.offset
  
       return True

This seems rather like a hack. Other tools like pytest-testmon try to solve this as well but they appear to be using a number of heuristics.

pascalwhoop
  • 2,984
  • 3
  • 26
  • 40
  • Does this answer your question? [How do I check if a python function changed (in live code)?](https://stackoverflow.com/questions/18134087/how-do-i-check-if-a-python-function-changed-in-live-code) – abhigyanj Oct 26 '20 at 10:05
  • 2
    The reason those two functions compare as equal is that you haven't included in the comparison their constant tuple. – Dan D. Oct 26 '20 at 10:06
  • I'm curious, why is it necessary to look at actual changes in the written code, is the code going to change between executions of the method? My approach to this would be to create a cache key based on all the values of arguments received by a function, and if it depends on state outside the function on these variables as well. If a cache is built up and then the actual code changes all the recorded cache values become irrelevant anyway the code wont be run in the same was it was before the change. – Jack Walton Oct 26 '20 at 10:11
  • @AbhigyanJaiswal it doesn't really in the sense that There's no answer in that question. The first answer suggests that it's like reinventing AST, the second answer is actually wrong (see code above) – pascalwhoop Oct 26 '20 at 10:39

1 Answers1

3

__code__.co_code returns the byte_code which doesn't reference the constants. Ignore the constants in your functions and they are the same.

__code__.co_consts contains information about the constants so would need to be accounted for in your comparison.

assert a.__code__.co_code != b.__code__.co_code \
       or a.__code__.co_consts != b.__code__.co_consts

Looking at inspect highlights a few other considerations for 'sameness'. For example, to ensure the functions below are considered different, default arguments must be accounted for.

def a(a1, a2=1):
    return a1 * a2

def b(b1, b2=2):
    return b1 * b2

One way to finger print is to use the built-in hash function. Assume the same function defintions as in the OP's example:

def finger_print(func):
    return hash(func.__code__.co_consts) + hash(func.__code__.co_code)

assert finger_print(a) != finger_print(b)
Guy Gangemi
  • 1,533
  • 1
  • 13
  • 25
  • Note that on the more recent Python versions, hash values in fact change for every new run of Python. See https://stackoverflow.com/questions/17585730/what-does-hash-do-in-python – Nir May 26 '23 at 12:58