27

I encountered this word for the first time in the StackOverflow question "C# Theoretical: Write a JMP to a codecave in asm." I see that according to Wiktionary, a code cave is:

an unused block of memory that someone, typically a software cracker, can use to inject custom programming code to modify the behavior of a program.

Did I find the correct definition? If so, is there any legitimate use for a code cave?

Community
  • 1
  • 1
Eddie
  • 53,828
  • 22
  • 125
  • 145
  • 1
    And a slightly off-topic question: If there is no legitimate use for one, should we vote to close such questions? – Eddie Apr 24 '09 at 18:51
  • 1
    +1, I had the same question myself for that very post. – Adam Robinson Apr 24 '09 at 18:53
  • 15
    @Eddie: No. The answer to a question being "no" doesn't invalidate it as a question. – chaos Apr 24 '09 at 18:56
  • 3
    No, who is to say we aren't writing anti-virus software. What are we trying to accomplish by closing the question? Are you trying to stop the information from escaping? Knowledge is your friend. – cgp Apr 24 '09 at 18:57
  • 3
    @Chaos and @altCognito: You confirmed my leanings. (Hiding information only hurts the good guys.) Still, I'd hate to even accidentally help someone who was trying to become a black-hat cracker. – Eddie Apr 24 '09 at 20:21

9 Answers9

21

One might wish to intentionally create a code cave as a part of using self-modifying code.

Assuming, of course, that one is insane.

chaos
  • 122,029
  • 33
  • 303
  • 309
  • 6
    Gotta love programming techniques that, while completely valid, are predicated on the programmer being insane... – Shog9 Apr 24 '09 at 19:03
  • 1
    An interesting answer, but is there any guarantee your code wouldn't be overwritten? Wouldn't someone writing self-modifying code protect that memory space? – cgp Apr 24 '09 at 19:04
  • 1
    Sure. It'd be just as protected as any other part of the program (modulo executable space protection, which could not be applied to the code cave). The cracking utility of a code cave isn't based on it being exempted from OS memory protection; to be used at all, the cracker has to have some way of getting data into it (like running the program under a debugger). – chaos Apr 24 '09 at 19:13
  • 1
    +1: Too funny. It hadn't occurred to me that this could be used as a deliberate self-modifying code mechanism. Wayyy back in the day (mid 80's) this used to be all the rage ... when 4k was a moderate amount of RAM and tricks had to be used to make your code fit into the allotted space. – Eddie Apr 24 '09 at 20:44
  • @cgp: no *guarantee* against overwriting, just the same as other allocations with `malloc` or `mmap` / `VirtualAlloc` for read/write (+exec) pages; the OS knows this process has that page mapped, and won't randomly select it again for future requests for more memory for dynamic allocations. (Or for mapping shared libraries or whatever). – Peter Cordes Jan 14 '22 at 02:17
17

I've used them, although I'd never heard the term code cave until today. The Wiktionary definition suggests that a code cave is something the cracker finds in the executable he or she is attempting to crack. The question you cite doesn't use it that way. Instead, it suggests the code cave is being allocated with VirtualAllocEx to create a brand new block of memory in the target process. That removes the need to search for unused space in the target, and it guarantees you'll have enough space to put all your new code.

Ultimately, I think a "code cave" is just a place to store run-time-generated code. There doesn't have to be any nefarious purpose to that code. And at that point, the question of what a code cave is becomes entirely uninteresting. The interesting parts are what reasons there are for generating code at run time, and what techniques there are for making sure that new code gets run when you want it.

Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
  • 1
    There are many good answers, but I selected yours because you addressed not only my question but also the question that caused me to wonder what a code cave is. (To the folks not selected as the answer who still gave good answers ... I voted up several good answers. Thank you!) – Eddie Apr 24 '09 at 20:19
16

Code caves are usually created by compilers for alignment and are often located between functions in copious amounts. There should also be code caves between structures and jumps (in some architectures), but usually not in any significant amounts.

You also might search for a block of zeroed memory, but there's no guarantee that the program won't use them.

I suppose theoretically, if you lost your source code, you could patch your buggy program by using them, and your program wouldn't grow in size.

Edit

To those of you suggesting code caves are only for run-time generated code: that is an incomplete definition. Many times I have written a data structure in a "code cave" and updated pointers to point there, and I suspect I am not the only person to do so.

Unknown
  • 45,913
  • 27
  • 138
  • 182
  • 1
    Compilers use padding for alignment. It only becomes a "code cave" if you actually put code there. If you put data there, that's common on ARM where it's called a "literal pool" (read-only data in the `.text` section, so it's reachable with PC-relative LDR instructions). Or you could do the same on other ISAs; I'd hesitate to call it a "code cave". And if it's read/write data, it would be a performance disaster on some ISAs, at least x86 where coherent I-cache is implemented by nuking the pipeline on stores near in-flight instructions. – Peter Cordes Jan 14 '22 at 02:10
  • 1
    It would be more accurate IMO to say "Padding created by compilers to align functions is common, and each one is a potential code cave". Unless the accepted definition of "code cave" is really "anywhere you *could* put code", even if you don't actually do so or have any intention of doing so. – Peter Cordes Jan 14 '22 at 02:12
9

some legitimate uses: patching live OS binaries without a reboot (MS does this), hooking low level OS functionality (filesystem, network) for firewall and antivirus, extending an application when you don't have source code (like scraping low level OS calls to DrawText so you can read them aloud for blind people)

Dustin Getz
  • 21,282
  • 15
  • 82
  • 131
7

The way it's described here reminds me of patchpoints -- a legit use.

Dan
  • 5,929
  • 6
  • 42
  • 52
5

Unfamiliar with the term but hot-patching mechanisms could use reserved space to store code patches. You hook into the defective function and redirect it to the new-improved function. It can be done on-the-fly without taking down critical equipment (large telecom switches).

DanM
  • 2,331
  • 2
  • 18
  • 14
4

It can be used to inject code at runtime. It can be used to write self-modifying code in static languages assuming that the OS lets you (NX bit not set, etc). There are uses for it, but it's not something you should be thinking about in your typical business app.

jfclavette
  • 3,457
  • 2
  • 20
  • 17
3

That sounds like the correct definition to me.

As for a legitimate use, let me say this: Don't do it unless you are simply experimenting for the sake of experimenting, and are willing to accept the consequences.

There is no way that this type of thing should ever go into production code:

  1. It is an enormous potential security problem. If it is possible to inject code into memory and then execute it, a malicious attacker can theoretically do, well, whatever they like.
  2. It is a code maintenance nightmare and debugging nightmare. If the code that ends up being run can change during runtime, it becomes almost impossible to track down errors and bugs.
e.James
  • 116,942
  • 41
  • 177
  • 214
  • 2
    "If it is possible to inject code into memory and then execute it" ummmmm on windows, you can do this to every single process, i presume you can do it with root privs on any OS. – Dustin Getz Aug 14 '09 at 18:03
  • 1
    "It is a code maintenance nightmare and debugging nightmare" no sane coder is going to be injecting code at runtime if they can modify the source. you do this when you're an absolute expert with no alternatives. – Dustin Getz Aug 14 '09 at 18:06
  • 1
    @Dustin Getz, Re point #1: An experienced programmer with root access can do just about anything. I am referring to the dangers of having an unprotected code cave in production software, which could leave the program open to malicious users who do not have root access. – e.James Aug 14 '09 at 19:57
  • 1
    @Dustin Getz, Re point #2: I agree. As I said in the second line of my answer, you have to be willing to accept the consequences. The rest of the warnings are there for the coders who are not sane, and I can tell you from experience that they do exist. – e.James Aug 14 '09 at 20:02
  • 1
    the presence of a code cave has zero impact on the security of an app. they're just compiler artifacts. – Dustin Getz Aug 14 '09 at 21:22
  • 1
    I don't think that is correct, especially in the context of this question. The question that Eddie links to describes an intentional code cave, as do many of the answers. If your production software includes a facility to inject and execute code, that is undoubtedly a security concern, and not simply a compiler artifact. – e.James Aug 14 '09 at 22:37
  • 3
    This is 3 years old - but wrong nevertheless. If your production software includes a facility to inject and execute code, then a possible attacker will still need some way to exploit that facility. For example he would need a way to access the memory of the application, either by injecting code (i.e. a dll), writing to the applications mem through another app or by directly modifying the hex code of the app. In any of those cases he could just create his own code cave or use one of those compiler artifact cc's instead. So it _is_ correct that the presence of a code cave has zero impact. – Sascha Hennig Apr 04 '12 at 01:26
3

Self-modifying code should not be considered lightly, but can sometimes bring big performance gains. If you've been programming for very long, you've probably used it without realizing it.

Prior to the widespread use of the 486 and higher, many PCs did not include hardware floating support. This left people writing programs involving floating point with a dilemma. If they compiled their program to use in-line floating point instructions it would run fast on a machine with a floating point processor, and not at all on machines without one. If they compiled their program with software floating point emulation, it would run on all machines, but slowly even on machines with hardware floating point.

Many compilers libraries used an interesting trick with self-modifying code. The default behavior was to put a trap instruction where a floating point operation was needed. The trap handler would either emulate the instruction in software, or if it detected it was running on a machine with floating point hardware, it would modify the code by replacing the trap instruction with the appropriate hardware floating point instruction and execute it. The result was software that ran on all machines, and ran almost as fast on a machine with floating point hardware as if the code had been compiled to use floating point hardware directly (since most floating point intensive operations occur in loops that are executed many times).

Stephen C. Steel
  • 4,380
  • 1
  • 20
  • 21
  • 1
    I knew about self-modifying code. (Used to play with ASM in the early 80s on microprocessors. 6809, 6502, Z-80, etc.) But hadn't considered a code cave as a mechanism for allowing purposeful self modifying code! – Eddie Apr 24 '09 at 20:46
  • This is unrelated to code caves. This is self-modification of the main code that was statically generated (in the `.text` section in modern terminology.) It is an interesting note about an automated and sane use of SMC, though. (Fun fact: the Linux kernel can patch itself on startup, e.g. on x86 to replace `lock` prefixes with `nop` if booted on a single-processor machine, in code other than drivers that need atomic RMW on device memory rather than memory other "threads" will see.) – Peter Cordes Jan 14 '22 at 02:21