37

I want to write a piece of code that changes itself continuously, even if the change is insignificant.

For example maybe something like

for i in 1 to  100, do 
begin
   x := 200
   for j in 200 downto 1, do
    begin
       do something
    end
end

Suppose I want that my code should after first iteration change the line x := 200 to some other line x := 199 and then after next iteration change it to x := 198 and so on.

Is writing such a code possible ? Would I need to use inline assembly for that ?

EDIT : Here is why I want to do it in C:

This program will be run on an experimental operating system and I can't / don't know how to use programs compiled from other languages. The real reason I need such a code is because this code is being run on a guest operating system on a virtual machine. The hypervisor is a binary translator that is translating chunks of code. The translator does some optimizations. It only translates the chunks of code once. The next time the same chunk is used in the guest, the translator will use the previously translated result. Now, if the code gets modified on the fly, then the translator notices that, and marks its previous translation as stale. Thus forcing a re-translation of the same code. This is what I want to achieve, to force the translator to do many translations. Typically these chunks are instructions between to branch instructions (such as jump instructions). I just think that self modifying code would be fantastic way to achieve this.

AnkurVj
  • 7,958
  • 10
  • 43
  • 55

11 Answers11

15

You might want to consider writing a virtual machine in C, where you can build your own self-modifying code.

If you wish to write self-modifying executables, much depends on the operating system you are targeting. You might approach your desired solution by modifying the in-memory program image. To do so, you would obtain the in-memory address of your program's code bytes. Then, you might manipulate the operating system protection on this memory range, allowing you to modify the bytes without encountering an Access Violation or '''SIG_SEGV'''. Finally, you would use pointers (perhaps '''unsigned char *''' pointers, possibly '''unsigned long *''' as on RISC machines) to modify the opcodes of the compiled program.

A key point is that you will be modifying machine code of the target architecture. There is no canonical format for C code while it is running -- C is a specification of a textual input file to a compiler.

Heath Hunnicutt
  • 18,667
  • 3
  • 39
  • 62
  • 1
    "A key point is that you will be modifying machine code of the target architecture." Meaning that you are breaking portability of regular C code. Just a heads up for others reading this (should be obvious). – Engineer May 23 '17 at 02:51
11

Sorry, I am answering a bit late, but I think I found exactly what you are looking for : https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/

In this article, they change the value of a constant by injecting assembly in the stack. Then they execute a shellcode by modifying the memory of a function on the stack.

Below is the first code :

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>

void foo(void);
int change_page_permissions_of_address(void *addr);

int main(void) {
    void *foo_addr = (void*)foo;

    // Change the permissions of the page that contains foo() to read, write, and execute
    // This assumes that foo() is fully contained by a single page
    if(change_page_permissions_of_address(foo_addr) == -1) {
        fprintf(stderr, "Error while changing page permissions of foo(): %s\n", strerror(errno));
        return 1;
    }

    // Call the unmodified foo()
    puts("Calling foo...");
    foo();

    // Change the immediate value in the addl instruction in foo() to 42
    unsigned char *instruction = (unsigned char*)foo_addr + 18;
    *instruction = 0x2A;

    // Call the modified foo()
    puts("Calling foo...");
    foo();

    return 0;
}

void foo(void) {
    int i=0;
    i++;
    printf("i: %d\n", i);
}

int change_page_permissions_of_address(void *addr) {
    // Move the pointer to the page boundary
    int page_size = getpagesize();
    addr -= (unsigned long)addr % page_size;

    if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
        return -1;
    }

    return 0;
}
Labo
  • 2,482
  • 2
  • 18
  • 38
  • 4
    The issue here is that the code modifies itself by setting Assembly instructions, which means that thereafter it's no longer the cross-platform C that it was meant to be - loses portability. So doesn't _quite_ answer the Q. – Engineer May 23 '17 at 02:48
9

It is possible, but it's most probably not portably possible and you may have to contend with read-only memory segments for the running code and other obstacles put in place by your OS.

Vatine
  • 20,782
  • 4
  • 54
  • 70
  • Sounds like they're developing an OS, so portability isn't a concern. – Jonathan M Sep 16 '11 at 16:04
  • 9
    mprotect(2) on Linux can be used to allow writes. mprotect(..., PROT_WRITE | PROT_EXEC) The non-portable answer that you're getting at - rewriting the functions themselves - is most certainly possible on many real-world systems, but it's not based on functionality present in C. – Daniel Papasian May 07 '13 at 03:33
5

This would be a good start. Essentially Lisp functionality in C:

http://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/

5

Depending on how much freedom you need, you may be able to accomplish what you want by using function pointers. Using your pseudocode as a jumping-off point, consider the case where we want to modify that variable x in different ways as the loop index i changes. We could do something like this:

#include <stdio.h>

void multiply_x (int * x, int multiplier)
{
    *x *= multiplier;
}

void add_to_x (int * x, int increment)
{
    *x += increment;
}

int main (void)
{
    int x = 0;
    int i;

    void (*fp)(int *, int);

    for (i = 1; i < 6; ++i) {
            fp = (i % 2) ? add_to_x : multiply_x;

            fp(&x, i);

            printf("%d\n", x);
    }

    return 0;
}

The output, when we compile and run the program, is:

1
2
5
20
25

Obviously, this will only work if you have finite number of things you want to do with x on each run through. In order to make the changes persistent (which is part of what you want from "self-modification"), you would want to make the function-pointer variable either global or static. I'm not sure I really can recommend this approach, because there are often simpler and clearer ways of accomplishing this sort of thing.

Pillsy
  • 9,781
  • 1
  • 43
  • 70
  • 7
    Will this code example _really_ do self modification ? Shouldn't modifying code require writing to the memory locations that contain the code ? I mean this code will be compiled to something where the call is made to either of the two conditions by evaluating a condition. But that is static code after all ? Isn't it ? – AnkurVj Sep 16 '11 at 16:14
  • No, you're right, it's static code, and won't serve your particular purpose (which sounds really interesting, BTW). – Pillsy Sep 16 '11 at 16:22
  • Of course, you don't need to set the function pointer in each loop iteration. You can initialize it to some function before, and change it whenever you want. You don't need to stick with one, either.. you could call a list of them sequentially. There's a lot that can be done with this if the idea is extended further.. don't give up on it too quickly.. – Dmitri Sep 16 '11 at 17:12
4

A self-interpreting language (not hard-compiled and linked like C) might be better for that. Perl, javascript, PHP have the evil eval() function that might be suited to your purpose. By it, you could have a string of code that you constantly modify and then execute via eval().

Jonathan M
  • 17,145
  • 9
  • 58
  • 91
  • 1
    Okay. But I really need to do it in a C code. Could it be possible using assembly instructions that can be written in C using inline assembly ? – AnkurVj Sep 16 '11 at 15:36
  • 1
    Well, C is a *compiled* language, which means you'll have to compile after each change, link (if necessary) and then execute the new executable file. C really isn't designed for on-the-fly code changes. – Jonathan M Sep 16 '11 at 15:39
  • 1
    In the your original post, you might tell a bit about why it needs to be in C, if it really, really does. – Jonathan M Sep 16 '11 at 15:40
  • @AnkurVj If you have to ask this kind of question, you are probably not capable of doing it. – Alan Sep 16 '11 at 15:47
  • 13
    @Alan, on the contrary, asking such questions is how we *become* able to do such things. – Jonathan M Sep 16 '11 at 15:48
  • 1
    @Alan, I have added an explanation to why I want to do this. – AnkurVj Sep 16 '11 at 15:51
3

The suggestion about implementing LISP in C and then using that is solid, due to portability concerns. But if you really wanted to, this could also be implemented in the other direction on many systems, by loading your program's bytecode into memory and then returning to it.

There's a couple of ways you could attempt to do that. One way is via a buffer overflow exploit. Another would be to use mprotect() to make the code section writable, and then modify compiler-created functions.

Techniques like this are fun for programming challenges and obfuscated competitions, but given how unreadable your code would be combined with the fact you're exploiting what C considers undefined behavior, they're best avoided in production environments.

Daniel Papasian
  • 16,145
  • 6
  • 29
  • 32
3

In standard C11 (read n1570), you cannot write self modifying code (at least without undefined behavior). Conceptually at least, the code segment is read-only.

You might consider extending the code of your program with plugins using your dynamic linker. This require operating system specific functions. On POSIX, use dlopen (and probably dlsym to get newly loaded function pointers). You could then overwrite function pointers with the address of new ones.

Perhaps you could use some JIT-compiling library (like libgccjit or asmjit) to achieve your goals. You'll get fresh function addresses and put them in your function pointers.

Remember that a C compiler can generate code of various size for a given function call or jump, so even overwriting that in a machine specific way is brittle.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
1

My friend and I encountered this problem while working on a game that self-modifies its code. We allow the user to rewrite code snippets in x86 assembly.

This just requires leveraging two libraries -- an assembler, and a disassembler:

FASM assembler: https://github.com/ZenLulz/Fasm.NET

Udis86 disassembler: https://github.com/vmt/udis86

We read instructions using the disassembler, let the user edit them, convert the new instructions to bytes with the assembler, and write them back to memory. The write-back requires using VirtualProtect on windows to change page permissions to allow editing the code. On Unix you have to use mprotect instead.

I posted an article on how we did it, as well as the sample code.

These examples are on Windows using C++, but it should be very easy to make cross-platform and C only.

MrExquisite
  • 3
  • 1
  • 2
Zachary Canann
  • 1,131
  • 2
  • 13
  • 23
  • This is not really C. And is very brittle (the C compiler is allowed to compile a call or a branch in various ways, and you'll have different code size even for that jump or call) – Basile Starynkevitch Aug 31 '18 at 05:02
1

This is how to do it on windows with c++. You'll have to VirtualAlloc a byte array with read/write protections, copy your code there, and VirtualProtect it with read/execute protections. Here's how you dynamically create a function that does nothing and returns.

#include <cstdio>
#include <Memoryapi.h>
#include <windows.h>
using namespace std;
typedef unsigned char byte;

int main(int argc, char** argv){
    byte bytes [] = { 0x48, 0x31, 0xC0, 0x48, 0x83, 0xC0, 0x0F, 0xC3 }; //put code here
    //xor %rax, %rax
    //add %rax, 15
    //ret
    int size = sizeof(bytes);
    DWORD protect = PAGE_READWRITE;
    void* meth = VirtualAlloc(NULL, size, MEM_COMMIT, protect);
    byte* write = (byte*) meth;
    for(int i = 0; i < size; i++){
        write[i] = bytes[i];
    }
    if(VirtualProtect(meth, size, PAGE_EXECUTE_READ, &protect)){
        typedef int (*fptr)();
        fptr my_fptr = reinterpret_cast<fptr>(reinterpret_cast<long>(meth));
        int number = my_fptr();
        for(int i = 0; i < number; i++){
            printf("I will say this 15 times!\n");
        }
        return 0;
    } else{
        printf("Unable to VirtualProtect code with execute protection!\n");
        return 1;
    }
}

You assemble the code using this tool.

Jessie Lesbian
  • 1,273
  • 10
  • 14
0

While "true" self modifying code in C is impossible (the assembly way feels like slight cheat, because at this point, we're writing self modifying code in assembly and not in C, which was the original question), there might be a pure C way to make the similar effect of statements paradoxically not doing what you think are supposed do to. I say paradoxically, because both the ASM self modifying code and the following C snippet might not superficially/intuitively make sense, but are logical if you put intuition aside and do a logical analysis, which is the discrepancy which makes paradox a paradox.

#include <stdio.h>
#include <string.h>

int main()
{
    struct Foo
    {
        char a;
        char b[4];
    } foo;

    foo.a = 42;
    strncpy(foo.b, "foo", 3);
    printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);

    *(int*)&foo.a = 1918984746;
    printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);

    return 0;
}
$ gcc -o foo foo.c && ./foo
foo.a=42, foo.b="foo"
foo.a=42, foo.b="bar"

First, we change the value of foo.a and foo.b and print the struct. Then we change only the value of foo.a, but observe the output.

JonnyRobbie
  • 526
  • 5
  • 16