103

I would like to understand how to use dis (the dissembler of Python bytecode). Specifically, how should one interpret the output of dis.dis (or dis.disassemble)?

.

Here is a very specific example (in Python 2.7.3):

dis.dis("heapq.nsmallest(d,3)")

      0 BUILD_SET             24933
      3 JUMP_IF_TRUE_OR_POP   11889
      6 JUMP_FORWARD          28019 (to 28028)
      9 STORE_GLOBAL          27756 (27756)
     12 LOAD_NAME             29811 (29811)
     15 STORE_SLICE+0  
     16 LOAD_CONST            13100 (13100)
     19 STORE_SLICE+1

I see that JUMP_IF_TRUE_OR_POP etc. are bytecode instructions (although interestingly, BUILD_SET does not appear in this list, though I expect it works as BUILD_TUPLE). I think the numbers on the right-hand-side are memory allocations, and the numbers on the left are goto numbers... I notice they almost increment by 3 each time (but not quite).

If I wrap dis.dis("heapq.nsmallest(d,3)") inside a function:

def f_heapq_nsmallest(d,n):
    return heapq.nsmallest(d,n)

dis.dis("f_heapq(d,3)")

      0 BUILD_TUPLE            26719
      3 LOAD_NAME              28769 (28769)
      6 JUMP_ABSOLUTE          25640
      9 <44>                                      # what is <44> ?  
     10 DELETE_SLICE+1 
     11 STORE_SLICE+1 
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535

2 Answers2

121

You are trying to disassemble a string containing source code, but that's not supported by dis.dis in Python 2. With a string argument, it treats the string as if it contained byte code (see the function disassemble_string in dis.py). So you are seeing nonsensical output based on misinterpreting source code as byte code.

Things are different in Python 3, where dis.dis compiles a string argument before disassembling it:

Python 3.2.3 (default, Aug 13 2012, 22:28:10) 
>>> import dis
>>> dis.dis('heapq.nlargest(d,3)')
  1           0 LOAD_NAME                0 (heapq) 
              3 LOAD_ATTR                1 (nlargest) 
              6 LOAD_NAME                2 (d) 
              9 LOAD_CONST               0 (3) 
             12 CALL_FUNCTION            2 
             15 RETURN_VALUE         

In Python 2 you need to compile the code yourself before passing it to dis.dis:

Python 2.7.3 (default, Aug 13 2012, 18:25:43) 
>>> import dis
>>> dis.dis(compile('heapq.nlargest(d,3)', '<none>', 'eval'))
  1           0 LOAD_NAME                0 (heapq)
              3 LOAD_ATTR                1 (nlargest)
              6 LOAD_NAME                2 (d)
              9 LOAD_CONST               0 (3)
             12 CALL_FUNCTION            2
             15 RETURN_VALUE        

What do the numbers mean? The number 1 on the far left is the line number in the source code from which this byte code was compiled. The numbers in the column on the left are the offset of the instruction within the bytecode, and the numbers on the right are the opargs. Let's look at the actual byte code:

>>> co = compile('heapq.nlargest(d,3)', '<none>', 'eval')
>>> co.co_code.encode('hex')
'6500006a010065020064000083020053'

At offset 0 in the byte code we find 65, the opcode for LOAD_NAME, with the oparg 0000; then (at offset 3) 6a is the opcode LOAD_ATTR, with 0100 the oparg, and so on. Note that the opargs are in little-endian order, so that 0100 is the number 1. The undocumented opcode module contains tables opname giving you the name for each opcode, and opmap giving you the opcode for each name:

>>> opcode.opname[0x65]
'LOAD_NAME'

The meaning of the oparg depends on the opcode, and for the full story you need to read the implementation of the CPython virtual machine in ceval.c. For LOAD_NAME and LOAD_ATTR the oparg is an index into the co_names property of the code object:

>>> co.co_names
('heapq', 'nlargest', 'd')

For LOAD_CONST it is an index into the co_consts property of the code object:

>>> co.co_consts
(3,)

For CALL_FUNCTION, it is the number of arguments to pass to the function, encoded in 16 bits with the number of ordinary arguments in the low byte, and the number of keyword arguments in the high byte.

Gareth Rees
  • 64,967
  • 9
  • 133
  • 163
  • 5
    amazing, is there any reference/tutorial/book which has all this low level details. I want to dig more – DevC Oct 18 '13 at 10:03
  • 4
    @DevC: Code objects are documented with the [`inspect`](http://docs.python.org/3/library/inspect.html) module. Byte code instructions are documented with the [`dis`](http://docs.python.org/3/library/dis.html#bytecodes) module. For the implementation details of the CPython virtual machine, you have to read the source code in [`ceval.c`](http://hg.python.org/cpython/file/47618b00405b/Python/ceval.c). – Gareth Rees Oct 19 '13 at 11:07
  • So [this](http://hg.python.org/cpython/file/default/Lib/dis.py#l291) show it is little-endian and two bytes long for `oparg`? – schemacs Feb 08 '14 at 08:00
  • Yes; also the [`NEXTARG` and `PEEKARG` macros in `ceval.c`](http://hg.python.org/cpython/file/47618b00405b/Python/ceval.c#l992). – Gareth Rees Feb 09 '14 at 22:09
  • @GarethRees for documentation on `ceval.c` you can refer to the Execution Model https://docs.python.org/3.3/reference/executionmodel.html – KeatsKelleher Sep 18 '17 at 11:15
  • excuse,what's the meaning of forth col's number 0 1 2 0 2 – dogewang Jun 04 '18 at 03:16
  • @dogewang: See the section of the answer starting "the numbers on the right". – Gareth Rees Jun 04 '18 at 07:55
  • @GarethRees what does number 2 represent after CALL_FUNCTION? And how does it know which function to call? Does it go back on the stack to pick up what was loaded with LOAD_NAME and LOAD_ATTR? – Fazzolini Oct 13 '18 at 19:43
  • I think I found the answer. Since CALL_FUNCTION knows how many parameters there are, it picks them all from the stack one by one. The function to be called is immidiately below parameters on the stack. I found the answer [here](http://unpyc.sourceforge.net/Opcodes.html). – Fazzolini Oct 13 '18 at 19:47
  • @Fazzolini: See the documentation for the [`CALL_FUNCTION`](https://docs.python.org/3/library/dis.html#opcode-CALL_FUNCTION) opcode: "The top of the stack contains positional arguments, with the right-most argument on top. Below the arguments is a callable object to call." – Gareth Rees Oct 13 '18 at 20:29
109

I am reposting my answer to another question, in order to be sure to find it while Googling dis.dis().


To complete the great Gareth Rees's answer, here is just a small column-by-column summary to explain the output of disassembled bytecode.

For example, given this function:

def f(num):
    if num == 42:
        return True
    return False

This may be disassembled into (Python 3.6):

(1)|(2)|(3)|(4)|          (5)         |(6)|  (7)
---|---|---|---|----------------------|---|-------
  2|   |   |  0|LOAD_FAST             |  0|(num)
   |-->|   |  2|LOAD_CONST            |  1|(42)
   |   |   |  4|COMPARE_OP            |  2|(==)
   |   |   |  6|POP_JUMP_IF_FALSE     | 12|
   |   |   |   |                      |   |
  3|   |   |  8|LOAD_CONST            |  2|(True)
   |   |   | 10|RETURN_VALUE          |   |
   |   |   |   |                      |   |
  4|   |>> | 12|LOAD_CONST            |  3|(False)
   |   |   | 14|RETURN_VALUE          |   |

Each column has a specific purpose:

  1. The corresponding line number in the source code
  2. Optionally indicates the current instruction executed (when the bytecode comes from a frame object for example)
  3. A label which denotes a possible JUMP from an earlier instruction to this one
  4. The address in the bytecode which corresponds to the byte index (those are multiples of 2 because Python 3.6 use 2 bytes for each instruction, while it could vary in previous versions)
  5. The instruction name (also called opname), each one is briefly explained in the dis module and their implementation can be found in ceval.c (the core loop of CPython)
  6. The argument (if any) of the instruction which is used internally by Python to fetch some constants or variables, manage the stack, jump to a specific instruction, etc.
  7. The human-friendly interpretation of the instruction argument
Delgan
  • 18,571
  • 11
  • 90
  • 141
  • 7
    this is what i am looking for!But I don't find in official doc, thx – Tarjintor Apr 24 '18 at 14:04
  • Is that nice output table a manual creation or is there a way to get that from `dis`? I see this https://docs.python.org/3/library/dis.html#dis.disco entry in the docs suggesting that I should be seeing those arrows and columns, but in my repl I do not... – d8aninja Mar 28 '21 at 23:58
  • 1
    @d8aninja I created the table manually to ease the explanation. :) You should not expect to see the columns in `dis` output. The arrows on (2) and (3) are possible though, but it depends on the code you're disassembling. – Delgan Mar 29 '21 at 08:15