3

Possible Duplicate:
How to get the length of a function in bytes?

I'm making a Hooking program that will be used to insert a method into the specified section of memory.

I need to get the length of a local C++ function, I've used a cast to get the location of a function, but how would I get the length?

would

int GetFuncLen()
{
  int i = 0;
  while((DWORD*)Function+i<max)
  {
    if((DWORD*)Function+i==0xC3)
    {
      return i;
    }
    i++;
  }
}

work?

Community
  • 1
  • 1
iDomo
  • 179
  • 1
  • 2
  • 14
  • 8
    There is no portable way to do that. (And no, your version won't always work.) I'm guessing you're on Windows from those `DWORD` things. Maybe there exists Windows specific things for this, but please specify your compiler in that case. – Mat Jan 02 '12 at 20:54
  • 1
    Are you trying ot acquire size of the function so you can just copy it over and inject it into a remote process? If so that won't work. – Captain Obvlious Jan 02 '12 at 20:58
  • 2
    have you considered these: [(a)](http://stackoverflow.com/q/5655624/1025391), [(b)](http://stackoverflow.com/q/8269832/1025391), [(c)](http://stackoverflow.com/q/4156585/1025391) ? – moooeeeep Jan 02 '12 at 20:59
  • 1
    This is way harder than it sounds. One time I tried something called LDE64 (or something like that) which was supposed to calculate the length of a function by following the path of the code, following branches, etc. to determine the farthest point from which the function returned, but it had a lot of problems and almost never worked, especially with optimisations on. – Seth Carnegie Jan 02 '12 at 21:00
  • 1
    Can you describe why any function is unfit to be at the location it's already at? That's an unusual problem to have. – Drew Dormann Jan 02 '12 at 21:03
  • And also Chet Simpson is correct, there is more to consider than just `memcpy`ing over the code, even if you do know it's size – Seth Carnegie Jan 02 '12 at 21:08
  • @DrewDormann he's probably trying to copy the function to another process's memory with WriteProcessMemory, etc. – Seth Carnegie Jan 02 '12 at 21:15
  • @iDomo: why do you ask that? What does "hooking program" means for you? Maybe your overall goal (which we don't understand) could be achieved otherwise. – Basile Starynkevitch Jan 02 '12 at 21:36

5 Answers5

5

Your code seems to be operating system, compiler, and machine architecture specific.

(I know nothing about Windows)

It could be wrong if max is not defined.

It is operating system specific (probably Windows only) because DWORD is not a standard C++ type. You could use intptr_t (from <cstdint> header).

Your code is compiler specific, because you assume that every compiled function has a well defined unique end, and don't share any code with some other functions. (Some compilers are able to do such optimizations, and e.g. make two functions sharing a common epilogue or code chunk, using jump instructions).

Your code is machine specific, because you assume that the last instruction would be a RET coded 0xC3 and this is specific to x86 & x86-64 (won't work on Alpha or ARM, on which Windows is rumored to have been or to be ported). Also, that byte could appear inside other instructions or inlined constants (as Mat commented).

I am not sure that the notion of where a binary function ends has a well defined meaning. But if it does, I would expect that the linker may know about it. On some systems, for example on Linux with ELF executable, the compiler and the linker produces the size of each function.

Perhaps you better need to find the symbol near to a given address. I don't know if Windows has such a functionality (on Linux, the dladdr GNU function from <dlfcn.h> could be useful). Perhaps your operating system provides an equivalent?

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 4
    The `0xC3` trick also assumes that `0xC3` doesn't appear anywhere but at the end of the function, and that's a pretty weak assumption. – Mat Jan 02 '12 at 21:08
  • Windows CE is doing well on ARM and MIPS, and I have seen Windows 8 to be running on ARM device as well. – marcinj Jan 02 '12 at 21:11
5

No. For a few reasons.

1) 0xC3 is only a 'ret' instruction if it is at the point where a new instruction is expected. There could easily be other instructions that include a 0xc3 byte within their operands, and you'd only get part of the code.

2) There can be multiple 'ret' instructions in any given function, depending on the compiler and it's settings. Again, you'd only get part of the function.

3) Functions often use constructs like "switch" statements, that use "jump tables" that are located AFTER the ret instruction. Again, you'd only get part of the function.

And what you're trying to do is not likely to work anyway.

The biggest problem is that various assembly instructions will often reference specific areas of memory by using offsets rather than absolute addresses. So while extremely minimal functions might work, any functions that call out into other functions will likely fail.

Assuming you're trying to load these functions into an external process, and you're trying to do this on Windows, a better solution is to use DLL injection to load a DLL into your target process.

If you really need to inject the memory, then you'll need an assembly language parser for your particular platform to update all of the memory addresses for the relevant instructions, which is a very complex task. Or you could write your functions in assembly language and make sure that you're not using relative offsets for anything other than referencing parts of your own code, which is a bit easier, but more limiting in what you can do.

Gerald
  • 23,011
  • 10
  • 73
  • 102
  • I found this to have nothing to do with my question. – iDomo Jan 02 '12 at 22:26
  • @iDomo Your question was answered with the very first word of the answer. Searching for 0xC3 will not work. If you really aren't looking to inject your code into another process - which would probably make what you ARE trying to do pointless - then you can ignore the third paragraph. The rest still applies, whether you are trying to move it in the same process or not. If you choose to assume otherwise, you will find out soon enough. – Gerald Jan 02 '12 at 22:50
  • I elaborated a little more on the answer to your specific question. – Gerald Jan 02 '12 at 23:02
  • 1
    Nitpicking: it is not an *assembly* language parser, but a **machine language parser** (which for x86 is quite hard, because you don't know where an instruction starts; a `0xc3`byte could be the `RET` instruction but it could also be inside some other instruction). – Basile Starynkevitch Jan 03 '12 at 13:06
4

You could force your function to be put in a section all by itself (see eg http://msdn.microsoft.com/en-us/library/s20kdbse(v=VS.71).aspx).

I think that if you define a section, declare a variable in it, define your function in it, then define another variable in it then the addresses of the two variables will cover your function.

Better is to put the two variables and the function in separate sections and then use section merging to control the order they appear in the resulting code (see How to refer to the start-of a user-defined segment in a Visual Studio-project?)

As others have pointed out you probably can't do anything useful with this, and it's not at all portable.

Community
  • 1
  • 1
Alan Stokes
  • 18,815
  • 3
  • 45
  • 64
2

The only reliable way to do this is to compile your code with a dummy number for the length of the function (but not run it), disassemble it, and calculate the length by hand, then take that number and substitute it for the dummy number, and recompile your program.

When I needed to do this, I just made a guess as to how big the function should be. As long as your guess is not to small (and not way way too big) you should have no problems.

Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
1

You can use

  • objdump

to get the size of objects with external linkage. Otherwise, you could take the assembly output of the compiler (gcc -S, e.g.) and assemble it manually, you'll have the opportunity to see what names the length fields get:

        .file   "test.cpp"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    %edi, -4(%rbp)
        movq    %rsi, -16(%rbp)
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 4.6.0-3~ppa1) 4.6.1 20110409 (prerelease)"
        .section        .note.GNU-stack,"",@progbits

See the .size main, .-main evaluation: it calculates the function size

sehe
  • 374,641
  • 47
  • 450
  • 633