2

So I was working on a project and was a little bored and thought about how to break C really hard: Is it be possible, to trick the compiler in using jumps (goto) for a function call? - Maybe, I answered to myself. So after a bit of working and doing I realised, that some pointer stuff wasn't working correctly, but in an (at least for me) unexpected way: the goto wouldn't work as intended. After a little bit of experimenting, I came up with this stuff (comments removed, since I sometimes keep unused code in them, when testing):

//author: me, The Array :)

#include <stdio.h>

void * func_return();
void (*break_ptr)(void) = (void *)func_return;

void * func_return(){
    printf("ok2\n");
    break_ptr = &&test2;
    return NULL;
    if(0 == 1){
      test2:
      printf("sh*t\n");
    }
}

void scoping(){
    printf("beginning of scoping\n");
    break_ptr();
    printf("after func call #1\n");
    break_ptr();
    printf("!!!YOU WILL NOT SEE THIS!!!!\n");
}

int main(){
    printf("beginning of programm\n");
    scoping();
    printf("ending programm\n");
}

I used gcc to compile this as I don't know any other compiler, that supports the use of that &&! My platform is windows 64 bit and I used that most basic way to compile this:

gcc.exe "however_you_want_to_call_it.c" -o "however_you_want_to_call_it.exe"

When looking over that code I expected and wanted it to print "sh*t\n" to the console window (of course the \n will be invisible). But it turns out gcc is somewhat too smart for me? I guess this comes, when trying to break something.. Infact, as the title says, it returns twice:

beginning of programm
beginning of scoping
ok2
after func call #1
ok2
ending programm

It does not return twice, like the fork function and propably prints the following stuff twice or sth., no it returns out of the function AND the function that called it. So after the second call it does not print "!!!YOU WILL NOT SEE THIS!!!!\n" to the console, but rather "ending programm", as it returned twice. (I am trying to amplify the fact, that the "ending programm" is printed, as the programm does not crash)

So the reason, why I posted that here, is the following: my questions..

  1. Why does it not go to/ jump to/ call to the actual test2 label and instead goes to the beginning of that function?
  2. How would I achieve the thing of my first question?
  3. Why does it return twice? I figured it is propably a compiler thing instead of a runtime thing, but I guess I'll wait for someones answer
  4. Can the same thing (the returning twice) be achieved the first time the function "break_ptr" is called, instead of the second time?

I do not know and do not care if this also works in c++.

Now I can see many ways this can be usefull, some malicious and some actually good. For example could you code an enterprise function, which returns your function. Enterprise solutions to problems tend to be weird, so why not make a function which returns your code, idk.. Yet it can be malicious, for example, when some code is returning unexpectatly or even without return values.. I can imagine this existing in a dll file and a header file which simply reads "extern void *break_ptr();" or sth.. did not test it. (Yet there are way crueler ways to mess with someone..)

I could not find this documented anywhere on the internet. Please send me some links or references about this, if you find some, I want to learn more about it.

If this is "just" a bug and someone of the gnu/gcc guys is reading this: Please do NOT remove it, as it is too much fun working with these things.

Thank you in advance for your answers and your time and I am sorry for making this so long. I wanted to make sure everything collected about this is in one place. (Yet still I am sorry if I missed something..)

The A
  • 188
  • 1
  • 9
  • 1
    First, this is not C. `&&test2` is a GNU extension specific to gcc compiler. – KamilCuk Feb 21 '21 at 10:25
  • I said so, I think :).. Just read on – The A Feb 21 '21 at 10:27
  • `void (*break_ptr)(void) = (void *)func_return;` The cast is not necessary, and I would even argue that it's wrong. – klutt Feb 21 '21 at 10:33
  • not doing the cast causes gcc to complain that it is the incorrect pointer type.. – The A Feb 21 '21 at 10:44
  • 1
    @TheArray Yes, because the pointer IS of wrong type ;) – klutt Feb 21 '21 at 10:47
  • it is just a dirty workaround I guess. I do not think it changes stuff, if I wrote this line differently, so I do not think it is part of the topic, idk – The A Feb 21 '21 at 11:01
  • @TheArray Thank you for doing that mistake. It gave me a real world example of why casting is a dangerous thing. Read the end of this answer, under "real world example" and you'll see why it's wrong and how you can correct it https://stackoverflow.com/a/63773195/6699433 – klutt Feb 21 '21 at 11:02
  • Oh no, yet still interesting.. I tested your recommendet solution out and it didn't change the output, that means still returning twice etc., mainly because L2 is still positioned weirdly as KamilCuk answered. It is therefor completly unimportant what kind of function this is or what it returns. If you still think differently or have something against this, please send me an example, that means propably a modification of my code in question. (I am confident, that it barely changes something, therefor I really don't care what it returns lol) I WILL read your following words! [thnkingemoji] – The A Feb 21 '21 at 11:26
  • @TheArray It was not intended to solve the output problems. It was just to show you how to (not) cast. It's not important to this problem, but I have gotten strange results related to that particular error, so it's still very important to understand. Never blindly cast just to silence a warning. It does not matter that you don't care about the return type. If you want to point at a function, it SHOULD be a matching function pointer. Anything else invokes undefined behavior. – klutt Feb 21 '21 at 11:34
  • Also, did you notice the function returns NULL no matter what? I really want to understand your point. EDIT: took a while for your answer to come true, this might interfeer with yours.. EDIT 2: Oh, I think now I get what you mean. I promise you, I will not and have never cast function types unnecessarly like in this example. I would have struggled to find some examples as well, I guess, so I would have taken this as well. – The A Feb 21 '21 at 11:34
  • It's not like that you have any guarantees that it will work just because you don't care about the return value. It DOES invoke undefined behavior. And that means that the standard imposes NO requirements WHATSOEVER on the compiler. And when the compiler optimizes the code it is free to, and will, assume undefined behavior will never happen. – klutt Feb 21 '21 at 11:35
  • Can you show me how this undefined behaviour changes the outcome of my code in a significant way? Maybe a version of the compiler that causes some different stuff to happen or sth. I see what you mean, I hope, yet I would like an example, idk. You seem very patient, so thank you for your time – The A Feb 21 '21 at 11:41
  • I actually edited the wrong answer. Here is the rant about casting. https://stackoverflow.com/a/62563330/6699433 – klutt Feb 21 '21 at 11:42
  • Since the nature of "undefined behavior" is precisely what it says, that the behavior is undefined, it CAN change the output and general behavior in your program in ANY way. Sure, some things are more likely than others. But a typical thing would be different behavior depending on optimization level and if you're in debug mode or not. – klutt Feb 21 '21 at 11:45

2 Answers2

3

From gcc documentation on labels of values:

You may not use this mechanism to jump to code in a different function. If you do that, totally unpredictable things happen.

The behavior you are seeing is properly documented. Inspect the generated assembly to really know what code does the compiler generate.

The assembly from godbolt on gcc10.2 with no optimizations:

break_ptr:
        .quad   func_return
.LC0:
        .string "ok2"
func_return:
        push    rbp
        mov     rbp, rsp
.L2:
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        mov     eax, OFFSET FLAT:.L2
        mov     QWORD PTR break_ptr[rip], rax
        mov     eax, 0
        pop     rbp
        ret
.LC1:
        .string "beginning of scoping"
.LC2:
        .string "after func call #1"
.LC3:
        .string "!!!YOU WILL NOT SEE THIS!!!!"
scoping:
        push    rbp
        mov     rbp, rsp
        mov     edi, OFFSET FLAT:.LC1
        call    puts
        mov     rax, QWORD PTR break_ptr[rip]
        call    rax
        mov     edi, OFFSET FLAT:.LC2
        call    puts
        mov     rax, QWORD PTR break_ptr[rip]
        call    rax
        mov     edi, OFFSET FLAT:.LC3
        call    puts
        nop
        pop     rbp
        ret
.LC4:
        .string "beginning of programm"
.LC5:
        .string "ending programm"
main:
        push    rbp
        mov     rbp, rsp
        mov     edi, OFFSET FLAT:.LC4
        call    puts
        mov     eax, 0
        call    scoping
        mov     edi, OFFSET FLAT:.LC5
        call    puts
        mov     eax, 0
        pop     rbp
        ret

shows that .L2 label was placed on top of function and the if (0 == 1) { /* this */ } was optimized out by the compiler. When you jump on .L2 you jump to beginning of the function, except that stack is incorrectly setup, because push rbp is omitted.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • Let me add, that breaking stuff was somewhat my intention.. it just broke in an unpredicted way.. Infact, I hoped to see some weird stuff, when a function wants some starting variable like void * func_return(int a){ I do not want it to start at the beginning of the function at all.. – The A Feb 21 '21 at 10:30
  • Inspect the generated assembly. [It's not that long](https://godbolt.org/z/Yo3E5a). Because `printf("****")` is never called gcc optimized it out. Also, when jumping, the stack is incorrectly set. – KamilCuk Feb 21 '21 at 10:32
  • I don't know what to say, except maybe: very interesting... Could it be, that the .L2 label is only set at the beginning of that function to prevent errors, when the 0 == 1 is optimized out? (Since you add this, I guess so...) I guess, I should try it out.. – The A Feb 21 '21 at 10:43
  • `Could it be, that the .L2 label is only set at the beginning of that function to prevent errors, when the 0 == 1 is optimized out?` The code is invalid. Making sense out of compiler behavior with invalid input makes no sense - compilers are written to work with valid code, it's irrelevant and no one cares how they work with invalid input (thus the term "undefined behavior" - it's not defined what will happen, why will happen). `gcc` has it's sources available online - inspect the sources and RTL passes and you may find out. – KamilCuk Feb 21 '21 at 10:45
  • Actually the behavior is documented as "unpredictable things will happen". Because you observed unpredictable things, the documentation is valid. I would argue, that `that the .L2 label is only set at the beginning` is done, so that "unpredictable things" _will_ happen as requested, as that is the behavior required from documentation. Gcc could be _specifically_ designed so that such code will behave in unpredictable ways. The only way to find out - read gcc source code. – KamilCuk Feb 21 '21 at 10:47
  • moving the return and changing the 0 == 1 to 1 == 1 or sth. does not cause the L2 label to be moved. Therefor it was a safety feature... very nice – The A Feb 21 '21 at 10:49
  • 1
    But I guess we are talking about different things now. I am completly aware that it is undefined/"unpredictable things will happen". I would have had no reason to do this, if it was. I guess I should have said that earlier, sorry. I didn't know such an assembly tool exists, thank you very much. It could make ones life easier. – The A Feb 21 '21 at 10:59
0

Why does it not go to/ jump to/ call to the actual test2 label and instead goes to the beginning of that function?

Because the documentation says that if you jump to another function "totally unpredictable things happen"

How would I achieve the thing of my first question?

Hard to say, since "jumping into a function" is not really something you should do.

Why does it return twice? I figured it is propably a compiler thing instead of a runtime thing, but I guess I'll wait for someones answer

Because returning twice is an element of the set of "unpredictable things"

Can the same thing (the returning twice) be achieved the first time the function "break_ptr" is called, instead of the second time?

See above. What you're doing will cause unpredictable things.

And just to point it out, your code has other flaws that may or may not be a part of this. func_return is a function taking an unspecified number of arguments returning a void pointer. break_ptr is a function taking NO arguments and returning void. The proper pointer would be

void * func_return();
void *(*break_ptr)() = func_return; 

Notice three things. Apart from removing the cast, I removed void from the parenthesis and added an asterisk. But a better alternative would be

void * func_return(void);
void *(*break_ptr)(void) = func_return; 

The main thing here is, do NOT cast to silence the compiler. Fix the problem instead. Read more about casting here

Your cast invokes undefined behavior, which essentially is the same thing as "unpredictable things happen".

Also, you're missing a return statement in that function.

void * func_return(){
    printf("ok2\n");
    break_ptr = &&test2;
    return NULL;
    if(0 == 1){
      test2:
      printf("sh*t\n");
    }
    // What happens here?
}

Omitting the return statement can only safely be done in a function returning void but this function returns void*. Omitting it will cause undefined behavior which, again, means that unpredictable things happen.

klutt
  • 30,332
  • 17
  • 55
  • 95
  • I appreceate your answer, yet I think KamilCuk answered more deeply, idk. I am aware, that what I am doing here is undefined and unpredictable things will happen, as I answered KamilCuk for example. I thank you for discussing the function pointer thingy, yet it feels more like you want to make sure to prevent some future errors, instead of solving this problem (the more I think about it the more confused I get why you want me so strongly to do it differently) As KamilCuks answer showed, I am NOT missing a return there, infact this is part of why it returns twice. There IS a return! – The A Feb 21 '21 at 12:11
  • I think my work here can be summarised as: I WANT to get undefined behaviour to work. I believe you are looking at the wrong things of undefinedness? – The A Feb 21 '21 at 12:12
  • @TheArray KamilCuk is not saying that it's a proper return, but just that the assembly code got a return from the compiler. – klutt Feb 21 '21 at 12:20
  • @TheArray Making "undefined behavior" to work is the wrong way to go. – klutt Feb 21 '21 at 12:21
  • On the first one I can agree, I think, but the second one is your opinion. – The A Feb 21 '21 at 12:22
  • @TheArray It's not because of what undefined behavior is. You cannot rely on it's behavior. If you use the same technique in a slightly different way, it may break. Or it may break because you're compiling with a different version of the same compiler, or using different compiler flags, or change something in a completely different part of the code. – klutt Feb 21 '21 at 14:14
  • @TheArray If you want this particular feature to work reliably when jumping to another function, I cannot see any easier option than writing your own compiler, and then you can make up your own rules of how this feature should work in your compiler. But if you're using gcc, then it will be unsafe by definition no matter what you do. – klutt Feb 21 '21 at 14:16
  • what speaks against a return function: return_function(), which calls the inline assembly: asm("pop %rbp");? This is reliabel and could even work with different compilers, since I don't use the gcc specific &&.. and please: yes I know this will break. – The A Feb 21 '21 at 14:41
  • @TheArray Inline assembly is not my forte, so unfortunately, I cannot answer anything about how to do that. – klutt Feb 21 '21 at 14:50