4

I want to be able to hash itself each time it is run. Is this possible without having to give the path to the script? I can see 2 ways to do this. The first way is to hash the source Python text file. The second way is to hash the compiled bytecode.

I see myself going with choice 2 so that raises a couple of other questions:

  1. Can a script determine where its compiled bytecode is from within the script?
  2. I'll ask this in a separate question.
starflyer
  • 484
  • 1
  • 11
  • 23
  • I have to wait 20 minutes to post the second question. – starflyer Oct 16 '13 at 20:21
  • 1
    What's the purpose of this? – NullUserException Oct 16 '13 at 20:21
  • I wanted to figure out if a Python script had changed. Option 1 would be the most conservative as a change in indentation (spaces to tabs) that didn't affect the syntax structure of the script wouldn't change the behavior of the script. Option 2 was doing an md5 hash of the compiled bytecode file. Maybe there are solutions that I'm unaware of? – starflyer Oct 16 '13 at 20:29

2 Answers2

7

A python script can figure out its own path with:

import os

path = os.path.abspath(__file__)

after which you can open the source file and run it through hashlib.md5.

A script file has no compiled bytecode file; only modules do.

Note that in Python 2, the __file__ path uses the extension of the file that was actually loaded; for modules this is .pyc or .pyo only if there was a cached bytecode file ready to be reused. It is .py if Python had to compile the bytecode, either because no bytecode file was present or because the bytecode file was stale.

You'll have to take into account that your code was invoked with command line switches that alter what bytecode Python loads; if a -O or -OO switch is given, or the PYTHONOPTIMIZE environment flag is set, Python will load or compile to a .pyo file instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

One possible (untested) solution is to use the disassembler module dis.dis() to convert a python class or module (but not instance) into assembly language. Two identically written classes with different class names will appear identical, but this could be fixed by adding cls.__name__ before running the combined string through md5

Note dis.dis() prints to stdout rather than returning a string, so there is also the added step of capturing the print output with StringIO

_

_ >>> import dis, md5
_ >>> class A(object): 
_ ...   def __init__(self, item): print "A(%s)" % item
_ ... 
_ >>> dis.dis(A)
_ Disassembly of __init__:
_   2           0 LOAD_CONST               1 ('A(%s)')
_               3 LOAD_FAST                1 (item)
_               6 BINARY_MODULO       
_               7 PRINT_ITEM          
_               8 PRINT_NEWLINE       
_               9 LOAD_CONST               0 (None)
_              12 RETURN_VALUE        
_ 
_ >>> class B(A):
_ ...   def __init__(self, item): super(A, cls).__init__(item); print "B(%s)" % item
_ ... 

_ >>> dis.dis(B)
_ Disassembly of __init__:
_   2           0 LOAD_GLOBAL              0 (super)
_               3 LOAD_GLOBAL              1 (A)
_               6 LOAD_GLOBAL              2 (cls)
_               9 CALL_FUNCTION            2
_              12 LOAD_ATTR                3 (__init__)
_              15 LOAD_FAST                1 (item)
_              18 CALL_FUNCTION            1
_              21 POP_TOP             
_              22 LOAD_CONST               1 ('B(%s)')
_              25 LOAD_FAST                1 (item)
_              28 BINARY_MODULO       
_              29 PRINT_ITEM          
_              30 PRINT_NEWLINE       
_              31 LOAD_CONST               0 (None)
_              34 RETURN_VALUE        
_ 
_ >>> class Capturing(list):
_ ...     def __enter__(self):
_ ...         self._stdout = sys.stdout
_ ...         sys.stdout = self._stringio = StringIO()
_ ...         return self
_ ...     def __exit__(self, *args):
_ ...         self.extend(self._stringio.getvalue().splitlines())
_ ...         del self._stringio    # free up some memory
_ ...         sys.stdout = self._stdout
_ ... 
_ >>> with Capturing() as dis_output: dis.dis(A)
_ >>> A_md5 = md5.new(A.__name__ + "\n".join(dis_output)).hexdigest()
_ '7818f1864b9cdf106b509906813e4ff8'
James McGuigan
  • 7,542
  • 4
  • 26
  • 29