2

Let's say I have a function like this (completely random, I just wrote it up in like 30 seconds for an example)

bool exampleAuthetnication(char *a, char *b)
{
    bool didAuthenticate = false;
    if(strcmp(a, b) == 0)
    {
        didAuthenticate = true;
    }

    if(didAuthenticate)
    {
        return true;
    }
    else
    {
        stopExecutable();
        return false;
    }
}

How would I go about reading the first few bytes of this function?

I've come up with this

int functionByteArray[10];
for (int i = 0; i < 10; i++)
{
    functionByteArray[i] = *(int*)(((int)&exampleAuthetnication) + (0x04 * i));
}

The logic behind it being that we get the memory address of our function (in this case exampleAuthetnication()) then we cast to int pointer then dereferance to get the value of the current line of bytes we are trying to read then store in functionByteArray, but it does not seem to work properly. What am I doing wrong? Is what I'm trying to accomplish possible?

Coder1337
  • 139
  • 3
  • 11
  • "but it does not seem to work properly" Why? Whats the result and what result did you expect? – tkausl Feb 28 '18 at 05:51
  • @tkausl the result seems to be just some random numbers, They do not mach up with what they should be based off viewing the function in IDA – Coder1337 Feb 28 '18 at 05:52
  • Thats probably because you read 32 bits as LittleEndian and IDA shows the data as bytes. – tkausl Feb 28 '18 at 05:53
  • @tkausl also, even if I change the first few lines of this function (exampleAuthetnication) the bytes stored in functionByteArray dont change? – Coder1337 Feb 28 '18 at 05:54
  • You don't explain why you want to get the first few bytes of the code, and on which machine and operating system (and ABI) you want to do that – Basile Starynkevitch Feb 28 '18 at 06:10
  • Can you put in your question the unexpected values you are seeing? – Michael Petch Feb 28 '18 at 06:21
  • What do you mean by "first few bytes of this function", exactly? Are you trying to extract parts of the compiled program from itself? – Lightness Races in Orbit Feb 28 '18 at 12:25
  • Related: [How to read binary executable by instructions?](https://stackoverflow.com/questions/49153556/how-to-read-binary-executable-by-instructions) has an answer that hexdumps the first 10 bytes of `main`, by casting a function pointer to a `const char*`. This does work in practice on typical C implementations, although as Basile's answer shows, it's not guaranteed by the standard, and there may be some real implementations where it doesn't work. – Peter Cordes Mar 08 '18 at 04:08

2 Answers2

4

In theory (according to the C++11 standard) you cannot even cast a function pointer into a data pointer (on Harvard architectures code and data sit in different memories and different address spaces). Some operating systems or processors might also forbid reading of executable code segments (read about NX bit).

In practice, on x86-64 (or 32 bits x86) running some operating system like Linux or Windows, a function code is a sequence of bytes and can be unaligned, and sits in the (common) virtual address space of its process. So you should at least have char functionByteArray[40]; and you might use std::memcpy from <string> and do some

std::memcpy(functionByteArray, (char*)&exampleAuthetnication,
            sizeof(functionByteArray));

At last your code is wrong because -on x86-64 notably- int have not the same size as pointers (so (int)&exampleAuthetnication is losing the upper bytes of the address). You should at least use intptr_t. And int has stronger alignment constraints than the code.

BTW, you might also ask your compiler to show the generated assembler code. With GCC compile your exampleAhtetnication C++ code with g++ -O -fverbose-asm -S and look into the generated .s file.

Notice that the C++ compiler might optimize to the point of "removing" some function from the code segment (e.g. because that function has been inlined everywhere), or split the function code in several pieces, or put that exampleAhtetnication code "inside" another function...

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • if I didn't cast to `(int)` I could not compile and would get errors that is why I did it like that. But thanks for the reply! – Coder1337 Feb 28 '18 at 06:14
  • I may be wrong but I think if you use the address of a function that it will use the non-inlined variant. – Michael Petch Feb 28 '18 at 06:27
  • Yes. But the compiler might still optimize to split its code, or emit a jump inside *another* function which has inlined `exampleAhtetnication`. – Basile Starynkevitch Feb 28 '18 at 06:29
  • @Coder1337: The issue is that you used `int` instead of `uintptr_t`. But you don't need to cast to an integer type at all, because you only needed to add, and C++ allows addition between integers and non-`void` pointers. – Peter Cordes Feb 28 '18 at 06:52
  • 1
    @BasileStarynkevitch: Yes, some compiler *might* do that, but gcc doesn't. If you take the address of a function, you will get a stand-alone definition of it even if the compilation unit / program didn't otherwise need one. – Peter Cordes Feb 28 '18 at 06:58
  • @PeterCordes: Isn't that up nto the linker to decide? Or does GCC stuff it in a larger section leaving the linker no choice? (I've always found it weird that `--ffunction-sections` works so poorly on GCC) – MSalters Feb 28 '18 at 13:54
  • I am not so sure. IIRC, some weird optimization options might do strange things. – Basile Starynkevitch Feb 28 '18 at 13:56
  • 1
    @MSalters: if the function is `static` or `inline`, it's the compiler has the option of not emitting a stand-alone definition. If not, only link-time optimization could drop it, I think, but it won't be able to if, after optimization, a valid pointer to the function is needed. (But I wouldn't call that "the linker".) – Peter Cordes Feb 28 '18 at 18:36
-1

C++ source code is not a list of instructions for a computer to perform; it is a collection of statements that describe the meaning of a program.

Your compiler interprets these statements and produces an actual sequence of instructions (via the assembly stage) that can actually be executed in our physical reality.

The language used to do so does not provide any facilities for examining the bytes that make up the compiled program. All of your attempts to cast function pointers and the like may randomly give you some similar data, via the magic of undefined behaviour, but the results are just that: undefined.

If you wish to examine the compiled executable, do so from outside of the program. You could use a hex editor, for example.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055