9

I need a function which can calculate the length of an x86-64 instruction.

For example, it would be usable like so:

char ret[] = { 0xc3 };
size_t length = instructionLength(ret);

length would be set to 1 in this example.

I do not want to include an entire disassembly library, since the only information I require is the length of the instruction.

I am looking for a minimalist approach, written in C, and ideally as small as possible.

100% complete x86-64 instruction set is not strictly necessary (very obscure ones such as vector register set instructions can be omitted).

A similar answer to what I am looking for (but for the wrong architecture):

Get size of assembly instructions

CHRIS
  • 957
  • 3
  • 10
  • 27
  • Why, exactly, would you need this outside the context of a disassembler? – Cody Gray - on strike May 28 '17 at 14:24
  • 2
    You will end up using library anyway or going to reinvent a wheel. Waste effort apparently. – 0andriy May 28 '17 at 14:28
  • 3
    SIMD instructions are common, not obscure. Anyway it looks reasonable to adapt that, there aren't many changes (REX and a 64bit immediate load) – harold May 28 '17 at 14:28
  • 1
    do you want us to point you to existing code or write code for you? – Iłya Bursov May 28 '17 at 14:30
  • If there is existing code which does exactly what I have asked, please link to it. – CHRIS May 28 '17 at 14:35
  • VTR: although the answer is a library, it is an official Intel library so I don't believe it is subject to being an "opinionated answer". The question and the accepted answer are precise and may be of use to future readers. – rici May 28 '17 at 15:49
  • Didn't want to have to include an entire library, but have settled for the accepted answer anyway. – CHRIS May 28 '17 at 15:55

2 Answers2

7

There is XED library from Intel to work with x86/x86_64 instructions: https://github.com/intelxed/xed, and it is the only correct way to work with intel machine codes.

xed_decode function will provide you all information about instruction: https://intelxed.github.io/ref-manual/group__DEC.html https://intelxed.github.io/ref-manual/group__DEC.html#ga9a27c2bb97caf98a6024567b261d0652

And xed_ild_decode is for instruction length decoding: https://intelxed.github.io/ref-manual/group__DEC.html#ga4bef6152f61997a47c4e0fe4327a3254

XED_DLL_EXPORT xed_error_enum_t xed_ild_decode    (   xed_decoded_inst_t *    xedd,
const xed_uint8_t *   itext,
const unsigned int    bytes 
)     

This function just does instruction length decoding.

It does not return a fully decoded instruction.

Parameters

  • xedd the decoded instruction of type xed_decoded_inst_t . Mode/state sent in via xedd; See the xed_state_t .
  • itext the pointer to the array of instruction text bytes
  • bytes the length of the itext input array. 1 to 15 bytes, anything more is ignored.

Returns:

xed_error_enum_t indiciating success (XED_ERROR_NONE) or failure. Only two failure codes are valid for this function: XED_ERROR_BUFFER_TOO_SHORT and XED_ERROR_GENERAL_ERROR. In general this function cannot tell if the instruction is valid or not. For valid instructions, XED can figure out if enough bytes were provided to decode the instruction. If not enough were provided, XED returns XED_ERROR_BUFFER_TOO_SHORT. From this function, the XED_ERROR_GENERAL_ERROR is an indication that XED could not decode the instruction's length because the instruction was so invalid that even its length may across implmentations.

To get length from xedd filled by xed_ild_decode, use xed_decoded_inst_get_length: https://intelxed.github.io/ref-manual/group__DEC.html#gad1051f7b86c94d5670f684a6ea79fcdf

static XED_INLINE xed_uint_t xed_decoded_inst_get_length  (   const xed_decoded_inst_t *  p   )   

Return the length of the decoded instruction in bytes.

Example code ("Apache License, Version 2.0", by Intel 2016): https://github.com/intelxed/xed/blob/master/examples/xed-ex-ild.c

#include "xed/xed-interface.h"
#include <stdio.h>

int main()
{
    xed_bool_t long_mode = 1;
    xed_decoded_inst_t xedd;
    xed_state_t dstate;
    unsigned char itext[15] = { 0xf2, 0x2e, 0x4f, 0x0F, 0x85, 0x99,
                                0x00, 0x00, 0x00 };

    xed_tables_init(); // one time per process

    if (long_mode) 
        dstate.mmode=XED_MACHINE_MODE_LONG_64;
    else 
        dstate.mmode=XED_MACHINE_MODE_LEGACY_32;

    xed_decoded_inst_zero_set_mode(&xedd, &dstate);
    xed_ild_decode(&xedd, itext, XED_MAX_INSTRUCTION_BYTES);
    printf("length = %u\n",xed_decoded_inst_get_length(&xedd));

    return 0;
}
osgx
  • 90,338
  • 53
  • 357
  • 513
  • 1
    Doesn't stop after decoding a single instruction, so you need to iteratively perform `xed_ild_decode` until it stops returning `XED_ERROR_BUFFER_TOO_SHORT` in order to make sure it only decodes 1 instruction. – CHRIS May 28 '17 at 15:40
  • CHRIS, this `xed_ild_decode` is only to decode single instruction, but it will be correct only when last argument is equal or greater than real instruction length. If you think that example code is incorrect, open issue at https://github.com/intelxed/xed – osgx May 28 '17 at 15:42
  • Xed also has hugest public database of x86/x86_64 instructions: https://github.com/intelxed/xed/tree/master/datafiles with format documented in misc/engineering-notes.txt - https://github.com/intelxed/xed/blob/master/misc/engineering-notes.txt – osgx May 28 '17 at 15:48
  • I seem to have some problem using it more than once. Does `xed_decoded_inst_get_length` reset to 0 after each use or does it return total number of bytes decoded? – CHRIS May 28 '17 at 15:49
  • 1
    Oh you need to call `xed_decoded_inst_zero_set_mode` every single time, not just once. – CHRIS May 28 '17 at 15:52
  • 1
    Does not reset anything: https://github.com/intelxed/xed/blob/b42583afd6664acf53928e69d79649327a93d805/include/public/xed/xed-decoded-inst-api.h#L274 - only reads `return p->_decoded_length;`; `xed_decoded_inst_zero_set_mode` will memset the structure to zero: https://github.com/intelxed/xed/blob/b42583afd6664acf53928e69d79649327a93d805/src/dec/xed-decoded-init.c#L27 `memset(p, 0, sizeof(xed_decoded_inst_t));` and init some fields in it. Sorry, not the 'minimalist approach' you asked, but correct way to find length in complex variable-length encoding architecture modified every year since 1978 – osgx May 28 '17 at 15:52
1

If you're on Windows, you can just use IDebugControl::Disassemble(..., &end_address) from dbgeng.dll. See this question for example usage.

user541686
  • 205,094
  • 128
  • 528
  • 886