26

Throughout various code, I have seen memory allocation in debug builds with NULL...

memset(ptr,NULL,size);

Or with 0xDEADBEEF...

memset(ptr,0xDEADBEEF,size);
  1. What are the advantages to using each one, and what is the generally preferred way to achieve this in C/C++?
  2. If a pointer was assigned a value of 0xDEADBEEF, couldn't it still deference to valid data?
MD XF
  • 7,860
  • 7
  • 40
  • 71
Skyler Saleh
  • 3,961
  • 1
  • 22
  • 35
  • 1
    Maybe an answer of [this](http://stackoverflow.com/questions/1296843/what-is-the-difference-between-null-0-and-0) question will help you... maybe... – JiminP May 06 '11 at 06:34
  • Why the `memset`? Why not just `ptr = NULL`? – fredoverflow May 06 '11 at 06:35
  • @Fred: the point is to mark the memory pointed to by ptr, not the pointer itself. Typically, the ptr is set to NULL after the memory that it points to has been marked. – sean e May 06 '11 at 06:36
  • Neither. Stop having pointers to things that don't exist. – GManNickG May 06 '11 at 06:36
  • @FredOverflow, It's not a pointer. It's a buffer of memory. – Eric Z May 06 '11 at 06:37
  • @FredOverflow: I believe this question is about initializing memory when using a custom allocator. – EboMike May 06 '11 at 06:37
  • @GMan: I assume this is about a custom allocator, i.e. what ends up getting called when you call `new MyClass`, before the constructor actually gets called. – EboMike May 06 '11 at 06:37
  • @Ebo: I don't understand, why initialize the memory at all then? It's just going to be over-written. – GManNickG May 06 '11 at 06:38
  • 1
    @GMan: He said "debug build". See my explanation below. It's very common in debug builds to initialize memory with a value that clearly identifies uninitialized memory, like 0xcdcdcdcd (which is what I believe Microsoft's debug allocator uses). It's extremely useful. – EboMike May 06 '11 at 06:40
  • @EboMike: marking also happens when memory is freed in VC debug builds. – sean e May 06 '11 at 06:41
  • @Ebo: Hm, right. I never really had much of a problem there, so I'll just say out of this one. :) – GManNickG May 06 '11 at 06:42
  • @sean: Exactly, although with a different value (0xdddddddd IIRC) to clearly identify "deleted memory" when looking at it in the debugger. – EboMike May 06 '11 at 06:43
  • @Ebo: and a different signature for buffer bounds (dead mans zone) – sean e May 06 '11 at 06:46
  • 1
    @sean: Guard words, correct. Freeing memory will also verify that the guard words around the allocation are still intact and assert if that's no the case. The first line in defense in trapping buffer and array overruns. – EboMike May 06 '11 at 06:47
  • One thing to emphasis here. Pointer are **NOT** usually set to magic values (as it does not help). What they point it is usually painted with a magic value by the memory allocator to indicate state (not allocated/just allocated/released). This is not usually done by the program but rather by the memory allocator. – Martin York May 06 '11 at 07:55
  • 7
    Thing is: assuming `CHAR_BIT` is 8, `memset(ptr, 0xDEADBEEF, size);` and `memset(ptr, 0xEF, size);` have the exact same effect. – pmg May 06 '11 at 08:39
  • 3
    @trinithis: the [description of `memset()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/memset.html) (7.21.6.1 in [the Standard](http://www.open-std.org/JTC1/sc22/wg14/www/docs/n1256.pdf)) gives the prototype as `void *memset(void *s, int c, size_t n);` and says: "The memset() function shall copy c (converted to an **unsigned char**) into each of the first n bytes of the object pointed to by s.". *Futhermore I tested it and verified my implementation behaves as the Standard describes.* – pmg May 07 '11 at 11:26
  • if needed, initialize to [0xCC](http://stackoverflow.com/questions/370195/when-and-why-will-an-os-initialise-memory-to-0xcd-0xdd-etc-on-malloc-free-new) instead – phuclv May 13 '14 at 10:25

10 Answers10

61
  1. Using either memset(ptr, NULL, size) or memset(ptr, 0xDEADBEEF, size) is a clear indication of the fact that the author did not understand what they were doing.

    Firstly, memset(ptr, NULL, size) will indeed zero-out a memory block in C and C++ if NULL is defined as an integral zero.

    However, using NULL to represent the zero value in this context is not an acceptable practice. NULL is a macro introduced specifically for pointer contexts. The second parameter of memset is an integer, not a pointer. The proper way to zero-out a memory block would be memset(ptr, 0, size). Note: 0 not NULL. I'd say that even memset(ptr, '\0', size) looks better than memset(ptr, NULL, size).

    Moreover, the most recent (at the moment) C++ standard - C++11 - allows defining NULL as nullptr. nullptr value is not implicitly convertible to type int, which means that the above code is not guaranteed to compile in C++11 and later.

    In C language (and your question is tagged C as well) macro NULL can expand to (void *) 0. Even in C (void *) 0 is not implicitly convertible to type int, which means that in general case memset(ptr, NULL, size) is simply invalid code in C.

    Secondly, even though the second parameter of memset has type int, the function interprets it as an unsigned char value. It means that only one lower byte of the value is used to fill the destination memory block. For this reason memset(ptr, 0xDEADBEEF, size) will compile, but will not fill the target memory region with 0xDEADBEEF values, as the author of the code probably naively hoped. memset(ptr, 0xDEADBEEF, size) is eqivalent to memset(ptr, 0xEF, size) (assuming 8-bit chars). While this is probably good enough to fill some memory region with intentional "garbage", things like memset(ptr, NULL, size) or memset(ptr, 0xDEADBEEF, size) still betray the major lack of professionalism on the author's part.

    Again, as other answer have already noted, the idea here is to fill the unused memory with a "garbage" value. Zero is certainly not a good idea in this case, since it is not "garbagy" enough. When using memset you are limited to one-byte values, like 0xAB or 0xEF. If this is good enough for your purposes, use memset. If you want a more expressive and unique garbage value, like 0xDEDABEEF or 0xBAADFOOD, you won't be able to use memset with it. You'll have to write a dedicated function that can fill memory region with 4-byte pattern.

  2. A pointer in C and C++ cannot be assigned an arbitrary integer value (other than a Null Pointer Constant, i.e. zero). Such assignment can only be achieved by forcing the integral value into the pointer with an explicit cast. Formally speaking, the result of such a cast is implementation defined. The resultant value can certainly point to valid data.

Siguza
  • 21,155
  • 6
  • 52
  • 89
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Normally your answer 1 is correct and 2 is incorrect, but there could be implementations with 32-bit chars which would make `memset(ptr, 0xDEADBEEF, size)` mean just that. Also on some platforms there are alignment requirements that makes dereferencing `0xDEADBEEF` would fail in some cases (and the implementation might not produce non-aligned pointers). – skyking Apr 07 '16 at 08:11
  • @skyking: There's nothing "incorrect" in my second point. Meanwhile, good point about implementations with 32-bit chars. – AnT stands with Russia Apr 07 '16 at 15:20
  • Well that depends on what you mean by normal and valid data, a bit clumpsy of me to express in that way. What is clear is that on most platforms the `0xDEADBEEF` can't point to data that is required or chosen (by the compiler) to reside on an aligned address. It also depends on what's meant by "valid data" - if you only require that you're able to access the data without segmentation fault then you would on x86 platform pass, but if you require that the data is actually valid then it's a bit less normal case since it would need to be `bool` or `char`. – skyking Apr 08 '16 at 07:03
  • ...in addition one could make the case that x86_64 is more normal these days and then you would end up with a full 64-bit value that is not valid as an address. – skyking Apr 08 '16 at 07:04
10

Writing 0xDEADBEEF or another non-zero bit pattern is a good idea to be able to catch both write-after-delete and read-after-delete uses.

1) Write after delete

By writing a specific pattern you can check if a block that has already been deallocated was written over later by buggy code; in our debug memory manager we use a free list of blocks and before recycling a memory block we check that our custom pattern are still written all over the block. Of course it's sort of "late" when we discover the problem, but still much earlier than when it would be discovered not doing the check. Also we have a special function that is called periodically and that can also be called on demand that just goes through the list of all freed memory blocks and check their consistency and so we can call this function often when chasing a bug. Using 0x00000000 as value wouldn't be as effective because zero may possibly be exactly the value that buggy code wants to write in the already deallocated block e.g. zeroing a field or setting a pointer to NULL (it's instead more unlikely that the buggy code wants to write 0xDEADBEEF).

2) Read after delete

Leaving the content of a deallocated block untouched or even writing just zeros will increase the possibility that someone reading the content of a dead memory block will still find the values reasonable and compatible with invariants (e.g. a NULL pointer as on many architectures NULL is just binary zeroes, or the integer 0, the ASCII NUL char or a double value 0.0). By writing instead "strange" patterns like 0xDEADBEEF most of code that will access in read mode those bytes will probably find strange unreasonable values (e.g. the integer -559038737 or a double with value -1.1885959257070704e+148), hopefully triggering some other self consistency check assertion.

Of course nothing is really specific to the bit pattern 0xDEADBEEF, actually we use different patterns for freed blocks, before-block area, after-block area and and also our memory manager writes another (address-dependent) specific bit pattern to the content part of any memory block before giving it to the application (this is to help finding uses of uninitialized memory).

6502
  • 112,025
  • 15
  • 165
  • 265
9

I would definitely recommend 0xDEADBEEF. It clearly identifies uninitialized variables, and accesses to uninitialized pointers.

Being odd, dereferencing a 0xdeadbeef pointer will definitely crash on the PowerPC architecture when loading a word, and very likely crash on other architectures since the memory is likely to be outside the process' address space.

Zeroing out memory is a convenience since many structures/classes have member variables that use 0 as their initial value, but I would very much recommend initializing each member in the constructor rather than using the default memory fill. You will really want to be on top of whether or not you properly initialized your variables.

EboMike
  • 76,846
  • 14
  • 164
  • 167
  • 3
    I didn't downvote, but I'd say the statement "the memory is likely to be outside the process' address space" is certainly dead wrong. On any 32-it architecture `0xDEADBEEF` is guaranteed to be inside the process address space, by definition. – AnT stands with Russia May 07 '11 at 18:26
  • However, usually the dereferencing a pointer in the zeropage results in an segfault same might not be true for 0xDEADBEEF the only reason dereferencing could result in an error is because it's not 4-bit aligned (and that's probably also the reason PPC doesn't like it). – Jasper Bekkers May 11 '11 at 16:40
  • @Jasper: It might not be true for dereferencing 0 either, and on Windows it will result in an error because that memory doesn't belong to the process. However, the point isn't just to force a segfault. Its to create a create a value that says "this is uninitialized" when a bug occurs and you're tracking it down. If that bug happens to be a segfault, then great, you caught it early. – Dennis Zickefoose May 11 '11 at 17:08
6

http://en.wikipedia.org/wiki/Hexspeak

These "magic" numbers are are a debugging aid to identify bad pointers, uninitialized memory etc. You want a value that is unlikely to occur during normal execution and something that is visible when doing memory dumps or inspecting variables. Initializing to zero is less useful in this regard. I would guess that when you see people initialize to zero it is because they need to have that value at zero. A pointer with a value of 0xDEADBEEF could point to a valid memory location so it's a bad idea to use that as an alternative to NULL.

Guy Sirton
  • 8,331
  • 2
  • 26
  • 36
  • 5
    Whether `0xDEADBEEF` can point to valid memory or not depends on the implemenation: it can't point to valid memory under Windows, for example (since Windows maps kernel code into this address). And it's unaligned, so it generally can't point to the beginning of an object. It's a good choice for uninitialized memory (as opposed to a null pointer---your distinction of the two is good). – James Kanze May 06 '11 at 08:36
4

One reason that you null the buffer or set it to a special value is that you can easily tell whether the buffer contents is valid or not in the debugger.

Dereferencing a pointer of value "0xDEADBEEF" is almost always dangerous(probably crashes your program/system) because in most cases you have no idea what is stored there.

Eric Z
  • 14,327
  • 7
  • 45
  • 69
  • 1
    It's not really so much "what is stored [at address 0xDEADBEEF]" being inherently mysterious or dangerous, as that the address is unlikely to be part of your virtual address space, causing a memory access violation / SIGSEGV or similar. Still, in my experience it's more common for memory content to be overwritten with DEADBEEF than for pointers to be loaded with it, though of course a pointer to an overwritten structure therefore becomes DEADBEEF indirectly.... – Tony Delroy May 06 '11 at 08:41
  • 2
    ...except that, of course, `memset` cannot be used to overwrite a memory region with a 4-byte pattern – AnT stands with Russia May 07 '11 at 19:01
1

DEADBEEF is an example of HexSpeek. With it, as a programmer you convey intentionally an error condition.

jeroenh
  • 26,362
  • 10
  • 73
  • 104
  • I already knew that, my question is whether I should use 0xDEADBEEF for uninitialized memory over null. – Skyler Saleh May 06 '11 at 06:49
  • a bit lame to downvote for that; it wasn't clear from your question you already knew this. Although clearly not THE answer to your question, I think it's still useful and on topic information. – jeroenh Aug 14 '11 at 14:56
1

I would personally recommend using NULL (or 0x0) as it represents the NULL as expected and comes in handy while comparison. Imagine you are using char * and in between on DEADBEEF for some reason (don't know why), then at least your debugger will come very handy to tell you that its 0x0.

Xolve
  • 22,298
  • 21
  • 77
  • 125
  • 1
    The problem is the amount of data that has a valid value of 0x00. When you see 0xDEADBEEF, or reminants of it, in the debugger, you know you have screwed up. When you see lots of 0's, you have no idea. – mattnz May 06 '11 at 09:07
1

I would go for NULL because it's much easier to mass zero out memory than to go through later and set all the pointers to 0xDEADBEEF. In addition, there's nothing at all stopping 0xDEADBEEF from being a valid memory address on x86- admittedly, it would be unusual, but far from impossible. NULL is more reliable.

Ultimately, look- NULL is the language convention. 0xDEADBEEF just looks pretty and that's it. You gain nothing for it. Libraries will check for NULL pointers, they don't check for 0xDEADBEEF pointers. In C++ then the idea of the zero pointer isn't even tied to a zero value, just indicated with the literal zero, and in C++0x there is a nullptr and a nullptr_t.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Zeroing memory will increase the likelihood that a read-after-delete or write-after-delete will go unnoticed. – 6502 May 06 '11 at 06:51
  • 1
    The fact that libraries check for `NULL` pointers is why you probably shouldn't do this, unless you are 100% sure the behavior will continue into release builds. If you forget to initialize some data, the surrounding code will happily ignore it, because you probably intended to initialize it to zero anyhow. Then, you switch to release build, and it goes back to actually looking like uninitialized data and all your `if(!p)` checks become worthless. Just be sure to pick a value that your system guarantees is invalid; `0` is generally not the only option. – Dennis Zickefoose May 06 '11 at 06:57
0

Note that the second argument in memset is supposed to be a byte, that is it is implicitely cast to a char or similar. 0xDEADBEEF would for most platforms convert to 0xEF (and something else for some odd platform).

Also note that the second argument is supposed to formally be an int which NULL isn't.

Now for the advantage of doing these kind of initialization. First of course the behavior would more likely be deterministic (even if we by this ends up in undefined behavior the behavior would in practice be consistent).

Having deterministic behavior will mean that debugging becomes easier, when you found a bug you would "only" have to provide the same input and the fault will manifest itself.

Now when you select which value you would use you should select a value that most likely will result in bad behavior - which means the use of uninitialized data would more likely result in a fault being observed. This means that you would have to use some knowledge of the platform in question (however many of them behave quite similar).

If the memory is used to hold pointers then indeed having cleared the memory will mean that you get a NULL pointer and normally dereferencing that will result in segmentation fault (which will be observed as a fault). However if you use it in another way, for example as an arithmetic type then you will get 0 and for many application that is not that odd number.

If you instead use 0xDEADBEEF you will get a quite large integer, also when interpreting the data as floating point it will also be quite large number (IIRC). If interpreting it as text it will be very long and contain non-ascii characters and if you use UTF-8 encoding it will likely be invalid. Now if used as a pointer on some platform it would fail alignment requirements for some types - also on some platforms that region of memory might be mapped out anyway (note that on x86_64 the value of the pointer would be 0xDEADBEEFDEADBEEF which is out of range for an address).

Note that while filling with 0xEF will have pretty much similar properties, if you want to fill the memory with 0xDEADBEEF you would need to use a custom function since memset doesn't do the trick.

skyking
  • 13,817
  • 1
  • 35
  • 57
0

Vote me down if this is too opinion-y for StackOverflow but I think this whole discussion is a symptom of a glaring hole in the toolchain we use to make software.

Detecting uninititialized variables by initializing memory with "garabage-y" values detects only some kinds of errors in some kinds of data.

And detecting uninititialized variables in debug builds but not for release builds is like following safety procedures only when testing an aircraft and telling the flying public to be satisfied with "well, it tested OK".

WE NEED HARDWARE SUPPORT for detecting uninitialized variables. As in something like an "invalid" bit that accompanies every addressability entity of memory (=byte on most of our machines) and which is set by the OS in every byte VirtualAlloc() (et. al, or equivalents on other OS's) hands over to applications and which is automatically cleared when the byte is written to but which causes an exception if read first.

Memory is cheap enough for this and processors are fast enough for this. This end of reliance on "funny" patterns and keeps us all honest to boot.

jeff slesinger
  • 121
  • 1
  • 4
  • This hardware support would come down to relying on software being written correctly to make use of it. In general the hardware has no way of knowing when a variable goes from being initialized to uninitialized, because that is a software concept. Further, you're talking about an extremely non-trivial amount of resources dedicated to this. Even just a single "dirty" bit would require an additional 12.5% increase in physical memory on modern systems, and likely more virtual memory since you can't just ask Windows to write 9 bits to the harddrive. – Dennis Zickefoose May 11 '11 at 16:46
  • There are many levels of error detection used when debugging modern software. This is just one of them. Nobody presents it as a sliver bullet or as the "one true way" to do it. Normally, it is just a single step in the system of security measures. And this single step, despite being rather simple, has proven to be quite effective at what it is supposed to do. The very nature of this measure makes it more appropriate in a debug builds (although I can see that sometimes it can be applicable it release builds as well). – AnT stands with Russia May 11 '11 at 16:47
  • Re Andrey's: No argument that garbage-y fill is useful -- of course it is. I'm greedy: I want more. Re Dennis's: not my intention to design the h'ware solution but one could imagine a single instruction per 4K block, issued by OS on malloc/virtalalloc/etc; the "invalid" bit would be invisible to software. It cost's 12.5% memory -- sounds cheap to me for the benefit. – jeff slesinger May 11 '11 at 22:22
  • Yes, it will only detect some errors, but on the other hand it does detect some errors and those errors are often quite serious. Note that using "garbage-y" values is not about improving run-time safety - it's more to make it more likely that faults will be evident. The analogy with the airplain would be to simulate malfunctions during test flies (like for example turn one motor off) - it's more like **not** following safety procedures when testing, but deliberately breaking them to see if it still would work. – skyking Apr 07 '16 at 08:47