2

In C. I need a way to execute computer instructions DIRECTLY while a program is running. The only way I know how to make computer instructions in binary is through a hex editor, then you run the file as an application. How would I write the binary using a program and then execute it from their without having to create a new process for execution. It seems like their should be a simple way to do this but no matter where I look I can't find it.

The only other way I can think of doing this is through inline assembly, but in my current project that would be a drawback, executing directly with binary is the best way to go. (would this possibly require a driver on windows? how to do on linux? in other words a cross-platform method would be nice)

Thank you.

Griwes
  • 8,805
  • 2
  • 43
  • 70
u8sand
  • 574
  • 5
  • 12
  • 1
    inlince assembly has no overhead as compared to your expectation (which is not quite possible AFAIK) as inline assembly gets compiled into binary. –  Mar 18 '12 at 19:59
  • 1
    All instructions are "in binary" when they're executed. Can you explain what you're trying to accomplish? – Caleb Mar 18 '12 at 19:59
  • 1
    Smash the stack - on purpose. – Robert Mason Mar 18 '12 at 20:00
  • Well its not that inline assembly has overhead its that the instructions are in binary, I'd have to do a conversion to assembly and execute assembly. All instructions are "in binary" yet I can't put the 'binary' myself. If I want to add 1+1 and store it in eax, and I did it in binary, where would I put this binary. now it has to be able to do it at runtime. for example, what if I accept an input, I type in my instructions (in binary) and the computer executes. – u8sand Mar 18 '12 at 20:01
  • 3
    Thanks for the -2 guys, because you don't understand the question. – u8sand Mar 18 '12 at 20:06
  • 4
    What you want to do is how most hackers operate. They find a way to enter data into a program and then convince the program to execute it. [E.g. a buffer overflow attack](http://en.wikipedia.org/wiki/Buffer_overflow) Be prepared that your virusscanner will probably detect your program as a virus or that your hardware will prevent you from executing code ([DEP](http://en.wikipedia.org/wiki/Data_Execution_Prevention)) – Joost Sannen Mar 18 '12 at 20:10
  • Hmm, however I don't need it to overflow any buffers, is that the only way to get the computer to execute the code? Because with overflowing buffers I risk overwriting something else in the code. Unless I just create the whole program in assembly so I can manage memory better.. – u8sand Mar 18 '12 at 20:11
  • 1
    No, you could just jump into it. But then again, you will have to fight security measures. – Joost Sannen Mar 18 '12 at 20:13
  • I don't get security problems when coding in assembly, so why should I directly putting in the binary? – u8sand Mar 18 '12 at 20:15
  • @u8sand: There should be no security issues. Java and .NET just-in-time compilers do this all the time. – Ben Voigt Mar 18 '12 at 20:33
  • coding in assembly it is compile time, pulling in a binary and branching to it is runtime. The protection system is based on what you came in with before runtime, the compiled code. You have to defeat that in order to change what portion of your allocated memory is executable vs data. – old_timer Mar 18 '12 at 22:01

4 Answers4

5

What you want to do is a bit problematic and is liable to make a lot of people ask "Why are you doing that?"

Assuming you have an operating system WITHOUT memory protection (which is very rare), you can just point a function to array of bytes and call the function. Here's the gist of it:

unsigned char* code_to_execute = "\xB8\x13\x00\xCD\x10";
void (*runCode)();

runCode = code_to_execute;

runCode();

But, there are SO MANY THINGS to worry about when doing something like this. You need to know how your C compiler is setting up function call frames and respect that in your "binary code". It's impossible to create cross-platform code in this manner. Even making it run in multiple C compilers on a single platform would be tricky. And then there's memory protection. Most modern operating systems simply won't let you arbitrarily execute data as code. You need to explicitly mark the memory as executable and many operating systems won't let you do that without special permission. There is no cross-platform way to do this either.

Again, I want to stress that this is really not a good idea. You would be better off using inline assembly language. Or better yet, don't use assembly language at all. Maybe you could explain a little more about your project and why it's important to write "binary code" directly in your C program. It would help us craft an answer or recommendation that could help you considerably.

Andy S
  • 8,641
  • 6
  • 36
  • 40
  • Thank you, what I meant by cross-platform was not that it work the same way on every system, but simply that it not be some platform-dependent method like loading a dll or creating a device driver. This is required for an artificial intelligence that has direct connections with the computer. A computer is a computer, so why convert the artificial intelligence's language (binary) to human readable (assembly) only to be converted right back to computer readable (binary). The AI is created FOR the OS, so worrying about how other processors work is not important. – u8sand Mar 18 '12 at 20:20
  • 2
    If you're curious, that example code would actually work if you could compile a DOS COM file and execute it in DOS. It would change the video mode to 320x200, 256 colors. B8 is "mov ax", 1300 is the 16-bit value to move into ax. CD is "call interrupt" and 10 is the video card interrupt for VGA controllers. – Andy S Mar 18 '12 at 20:25
  • Now I was wondering, you don't need to cast it? though I haven't actually tried it out yet. – u8sand Mar 18 '12 at 20:27
  • 1
    Okay, I understand what you're saying about artificial intelligence, but... There are so many other problems with AI, getting "it" to read/write directly to the CPU is the LEAST of your problems. However, should you wish to continue your project, I would recommend you create a virtual machine for your AI, rather than having it run directly on the CPU. You're liable to learn a lot more and the abstraction could be very helpful. Check out: http://en.wikibooks.org/wiki/Creating_a_Virtual_Machine/Register_VM_in_C You can optimize to pure machine code after you succeed/understand your AI code. – Andy S Mar 18 '12 at 20:32
  • So make it into a Operating System, this was a thought at one point. The problem that came up is that AI is ALREADY a big project, now I have to create a OS along with it. So I may use a Linux Kernel or something and go from there. – u8sand Mar 18 '12 at 20:34
  • You do need to cast it. It's only for the C compiler's benefit. It seems like you should be able to just "call" an arbitrary pointer, but you can't. You need to tell the compiler "there is a function at this address" and then call it. – Andy S Mar 18 '12 at 20:34
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/9019/discussion-between-u8sand-and-andy-s) – u8sand Mar 18 '12 at 20:38
4

This SO question covers the topic perhaps

How to write self-modifying code in x86 assembly

Dont let people slow you down with the "why are you doing this questions"...

You have to know enough about the language or operating system or both and punch through or work within the protection system. Then it is a matter of putting the binary you want to execute (assuming you have done your work to make it positition independent and/or dependent on the addressing given/found/acquired/whatever) in memory and branch to it. In C you can declare a function pointer then assign the address to that function being the address of this binary and then call the function (if you have no other way to branch to an address, I usually prefer to write a few lines of asm and link them that perform the branch to any arbitrary address I pass the asm).

Community
  • 1
  • 1
old_timer
  • 69,149
  • 8
  • 89
  • 168
2

With the machine code in memory, cast its address to a function pointer. Of course you need to comply with the C calling convention.

On most desktop OSes, you'll need to change memory permissions to mark it executable, e.g. on Windows call VirtualProtect and on Linux mprotect.

Binary machine instructions are not cross-platform. You will need to generate different code for each processor architecture.

No driver is needed as long as the code only requires user-level permissions.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Well yes I understand binary instructions are not cross-platform. I just need a way to "talk with the computer itself" basically. However the method you state sounds like loading a dll, how shall I get my binary into the program, put it directly into a byte array like 'shell-coding' ? – u8sand Mar 18 '12 at 20:05
  • @u8sand: A byte array will work. So will a string like Andy showed in his answer. It's better if you allocate it into its own page, because execute permission is per-page, but that's not an absolute requirement. – Ben Voigt Mar 18 '12 at 20:31
2

Generating and running code at run-time, within an application, is quite a well understood problem.

You will be able to find lots of information about generating code, and executing it on the fly, by searching the web for "just in time compiler" or "JIT compilation", "dynamic code generation", especially combined with a programming language name, like 'Java'.

Dynamic code generation is one of the hot research topics of the last fifteen years.

The Java run time system (called the Java Virtual Machine or JVM) uses dynamic compilation technology (called HotSpot) to get dramaticly improved (i.e. > 10x faster) performance.

Microsoft use Just-In-Time compilation for .NET languages like C#, but it will likely be harder to find much detailed information.

Ian Piumarta has been developing some very impressive dynamic compilation technology (e.g. Cola) at Viewpoints Research Institute, working with Alan Kay, the 'father' of SmallTalk (SmallTalk is the starting point for modern Windowing Systems and some types of object orientated programming languages). Some of this technology was used to speed up the Cairo rendering engine (used in web browsers, etc.) in a Google Summer of Code project.

Ian Piumarta's work might be the most flexible, and compact, and hence a good place to start. Be warned, Ian is incredibly clever, so be prepared to really think hard if you want to use it.

There are a few JIT's which generate assembler, which may be exactly what you need (but I have never used them):

Other technologies which might be worth a look include:

Of those, LLVM's JIT and GNU libjit are probably the best documented for use outside of their normal 'host' environment. LLVM is technology Apple support. LLVM is designed to be assembled from a set of libraries so that it can be used to build other systems. But it is aimed at very sophisticated, high-performance solutions, so there is likely quite a steep learning curve. GNU libjit looks to be smaller, and hopefully easier to understand because of that. I have only just discovered Intel's ORP.

HTH

gbulmer
  • 4,210
  • 18
  • 20