45
char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);

output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?

and why malloc(0) doesn't cause runtime error?

Thanks.

woongiap
  • 543
  • 1
  • 5
  • 8
  • As you see in the answers this is a bad idea :) To ensure that you don't have that kind of code you could use tools like valgrind or yamd or something like that. I haven't been a C developer for a long time so I don't know the current tools :) – extraneon Aug 18 '10 at 07:26
  • yeah of course it is a bad idea, I just wanted to know why does c runtime allow me to do so – woongiap Aug 18 '10 at 07:59
  • 1
    It allows you to do so because checking means you have to keep the allocated size of every malloc()ed object and check every conceivable access of that object on whether it violates its bounds, and C's designers considered such runtime checking to be too expensive to always be performed for the environments in which C programs are used. (Pascal, developed at the same time, does perform such checks.) – reinierpost Aug 18 '10 at 08:30
  • but why malloc(0) also works? – woongiap Aug 18 '10 at 08:39
  • 1
    Because whatever algorithm they used didn't bother to waste time checking to make sure people aren't messing with it – Tomas Aug 18 '10 at 08:48
  • @woongiap: Define "works". In any case, `malloc(0)` can either return null, or a pointer to some value that is not to be dereferenced. [See this question](http://stackoverflow.com/questions/1073157/zero-size-malloc). – GManNickG Aug 18 '10 at 08:48
  • @woongiap, `malloc(0)` might work in the sense that it returns a pointer that needs to be passed to `free()` eventually. If it does work, you have no right to do anything more with that pointer than that, or perhaps pass it to `realloc()` to make it have a useful size. Specifically, the amount you ask for is a **binding promise** from your code to the library that you will **never** access anything outside the allocation. If you ask for zero bytes and get a pointer, you must not access any memory at all through that pointer. – RBerteig Aug 18 '10 at 08:49
  • @GMan, it means that the output is "123456789" in both compilers. @RBerteig, I wonder why would any implementation return an address with malloc(0), I think returning NULL if given size_t is not bigger than 0 is straight forward. – woongiap Aug 18 '10 at 08:59
  • As I said, malloc is a heavily used algorithm, and since a program which is asking for 0 bytes of memory is likely to crash anyway it is left to the programmer to decide if they want to run a check to make sure every things in order. – Tomas Aug 18 '10 at 09:04
  • 2
    @woongiap: Then you haven't learned from the thread. It's not working, it's entering undefined behavior. You should learn to separate work from output. "The output's correct." is the worse definition of works. – GManNickG Aug 18 '10 at 09:05
  • @GMan, okay, question edited. – woongiap Aug 18 '10 at 09:07
  • Use alloca() if you want to be sure that it will crash :) – ruslik Aug 18 '10 at 09:10
  • @woongiap: It's the exact reason as `malloc` any number and going beyond your memory: undefined behavior. It might not crash today and crash tomorrow, etc., *it's impossible to know because it's not defined.* – GManNickG Aug 18 '10 at 09:32
  • On my system, gcc says "warning: call to __builtin___strcpy_chk will always overflow destination buffer", and the program outputs "Abort trap". – Josh Lee Aug 18 '10 at 12:56
  • @jleedev, what system you are on? gcc version? mine is 4.1.2. – woongiap Aug 19 '10 at 01:33
  • note: [don't cast the result of malloc in C](http://stackoverflow.com/q/605845/995714) – phuclv Dec 31 '16 at 07:27

17 Answers17

79

You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.

To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.

malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.

The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.

If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.

When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!

PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!

PP.
  • 10,764
  • 7
  • 45
  • 59
  • 3
    Excellent answer! I've been doing OS development for a few years and fixing MMU problems on and off for the last year. Until now, I have never thought of thinking of some of the MMU behaviour as a callback into the OS--just as an exception that gets triggered and work that needs to be done. It simplifies things; thankyou! – Sparky Aug 18 '10 at 11:29
23

No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).

There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.

More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.

GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • 1
    It is undefined by the language specification, but that doesnt mean it cannot be predicted. As you say, it depends on the OS, compiler and state of your virtual memory. Since this last part is extremely hard to predict and say something useful about it, people call it undefined. But look at the answer PP has given. Finally, it most certainly is not correlated in any way to your favourit soda ;) The behavior is very deterministic, given that you have enough information. Btw, +1 anyway since the main idea is correct and for the funny story :) – Henri Aug 18 '10 at 10:05
  • @Henri: I think PP's answer is good in that it touches on some generic platform, but it does fail to talk about the language. It's important people learn "undefined behavior" means "don't do it", not "oh, but how does it work, how is it *defined*?" One can always learn about a platform out of curiosity, not trying to understand some undefined behavior. – GManNickG Aug 18 '10 at 10:32
  • @Henri: For the C language, “undefined” has a very specific meaning. It does not imply that no one can guess what happens, but it does mean that the standard has not defined what should in any way and does not require a compiler writer to do so. (The latter is the primary difference to “unspecified behavior.”) In practice, it usually means that crashes or, sometimes, worse can happen at what looks like random situations. Including the not-so-rare situation where everything works in your tests and the system goes south when your customer touches it. And only the programmer and QA are to blame… – Christopher Creutzig Aug 18 '10 at 15:23
  • 2
    @Henri: it's more complicated than what you suggest. A compiler can use knowledge that certain code could produce undefined behaviour to optimise in ways that are _not_ predictable, regardless of how well you understand the underlying OS/hardware, without doing a case-specific analysis of the compiler's internals; further, later versions of the same compiler (or a different compiler) may behave differently regardless. A compiler could assume that code which necessarily invokes UB is never executed, for instance, and be conformant (as well as producing efficient code as a by-product). – davmac Apr 19 '17 at 13:37
18

When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.

Note, however, that it is definitely undefined behavior!

Consider the following (hypothetical) example of what might happen when using malloc:

  1. malloc(1)
  2. If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
  3. Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
  4. Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
  5. Let's say you do another malloc(1)
  6. Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
  7. If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.

So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.

mtvec
  • 17,846
  • 5
  • 52
  • 83
  • That is not true at all, at least on Windows. You are given exactly what you ask for. The heap will allocate 1 byte, no more. In the event it is written past it is undefined, but he is definitely NOT given a page. – linuxuser27 Aug 18 '10 at 07:17
  • 1
    @linuxuser27: I'm no expert on Windows, but I *seriously* doubt it gets 1 byte. It would be a waste, it's cleaner just to get a chunk of memory and use that, until you need another chunk of memory. – GManNickG Aug 18 '10 at 07:19
  • It does get 1 byte. When an application starts in Windows a chunk of memory is pre-allocated by the OS for the heap. Calls to malloc() and the like eventually head down to HeapAlloc() which then divide up the pre-allocate heap to the process. Of course the heap can grow as is needed, but when you ask for 1 byte, that is what you get. If you want to see it you can call HeapWalk() and see for yourself. I hate to be a stickler for this, but I work on the visual studio profiler and do a lot of memory profiling :) – linuxuser27 Aug 18 '10 at 07:25
  • 2
    @linuxuser27: "a chunk of memory is pre-allocated" I think this is what me and Job are talking about (total memory usage), while you're talking about the returned result. – GManNickG Aug 18 '10 at 07:27
  • @GMan @linuxuser27: Yes that's what I'm talking about. See my edits. – mtvec Aug 18 '10 at 07:35
  • 4
    @linuxuser27: Calls to malloc eventually head down to HeapAlloc, and calls to HeapAlloc eventually result in calls to VirtualAlloc, which allocates full pages. So allocating 1 byte *will* allocate a full page. The full page is readable/writable for the process. If the memory is committed, the full 4 KB are used by the process. – Niki Aug 18 '10 at 07:46
  • @nikie yes you are correct. If the current heap size is not enough, VirtualAlloc() is called and it can be called in other instances too. My criticism was originally that the post seemed to imply that if you request even 1 byte, you are going to get a full page allocated for that particular allocation. Which could be interpreted as meaning that the remainder of the page will not be used. It was how I read it, the edits clear that up. Sorry for picking nits here. @Job Step 6 would not return 0x1001 though. It would need to be aligned. But your point has been made. +1 – linuxuser27 Aug 18 '10 at 15:48
4

No. It means that your program behaves badly. It writes to a memory location that it does not own.

Didier Trosset
  • 36,376
  • 13
  • 83
  • 122
  • okay, so, is it possible that the extra memory I used would be allocated to someone else? – woongiap Aug 18 '10 at 07:13
  • @woon: *Anything* is possible. You've got UB, trying to guess what happens next is a dead-end. Yes, it could be handed out, maybe not on a common platform, maybe so on some other platform, who knows. – GManNickG Aug 18 '10 at 07:14
  • It can have been allocated to some other malloc call, in which case other variables of your program will change value without actually being changed; or still be unused, and your program will work by chance; or be used by the C library heap memory manager to store malloc internal data, and your program may crash on subsequent call of malloc or free. – Didier Trosset Aug 18 '10 at 07:19
2

You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
2

You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory

ckv
  • 10,539
  • 20
  • 100
  • 144
  • Which will invariably happen while doing a dog-and-pony show for the customer. And they say that Undefined Behaviour bugs aren't conscious, ha! – msw Aug 18 '10 at 07:21
2

So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).

EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.

Patrick Schlüter
  • 11,394
  • 1
  • 43
  • 48
  • You're answer is interesting but not consistent with the one that I think you are referring to. The other author implies that malloc could eventually return some of those bytes from the first allocation to other malloc calls. therefore the code in question might appear to work now but if one incorrectly writes to the buffer many times then eventually problems could result. You seem to be implying something else, that there is possibly some "wasted" bytes in the allocation. You could be right too in the sense that every system handles this differently. – shawn1874 May 05 '17 at 23:15
  • I referred at that time to the experience I had with the Solaris allocator which doesn't behave like glibc's allocator. Solaris always will round the allocation size to at least the upper 16 byte multiple. malloc(1) for instance will return a buffer of size 16. This means that you can write after your allocation size without problems. The allocator on Solaris is extremely lenient,use after free rarely crashes, buffer overflows do nothing etc. When we ported our software to Linux, we had the surprize of triggering a lot of segfaults where Solaris wouldn't care. – Patrick Schlüter May 06 '17 at 14:10
1

You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.

codaddict
  • 445,704
  • 82
  • 492
  • 529
1

On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.

If you use a few bytes more you'll very likeley get an access violation.

jdehaan
  • 19,700
  • 6
  • 57
  • 97
1

malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.

It is responsibility of the programmer to use only the memory that has been allocate. Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.

For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage". It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.

Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.

In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted. When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)

Chris Cinelli
  • 4,679
  • 4
  • 28
  • 40
1

To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.

ig2r
  • 2,396
  • 1
  • 16
  • 17
0

It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.

default
  • 11,485
  • 9
  • 66
  • 102
0

You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...

Charlie
  • 136
  • 3
0

There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.

When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.

Chris
  • 1,303
  • 8
  • 6
  • @woongiap: It doesn't have to. It can return memory if it wants, but that memory cannot be used. – GManNickG Aug 18 '10 at 09:33
  • Since the compiler doesn't check if your function arguments are stupid, many malloc() implementations will allocate their smallest allocation unit, usually 8 bytes, for malloc(0) calls. Some malloc() implementations will print errors to standard out or crash if you set environment variables to turn on checking for stupid things like that, but by default it will generally attempt to do what you ask, however idiotic it may be. – Chris Aug 18 '10 at 22:40
  • it's fine that the compiler doesn't check that. malloc(0) might be stupid, but malloc(a-b) isn't. I think malloc() implementation should simply return NULL if given size is not larger than 0 since the actual allocation algorithm is already complicated enough. – woongiap Aug 19 '10 at 01:42
  • Prior to modern threading libraries, malloc(0) was sometimes abused for unique token generation in multithreaded programs, since the synchronization was handled automatically. malloc(1) would work just as well, but malloc(0) was preferred because it made it obvious you were abusing malloc. Platforms that have been around long enough to have code like this written for them are reluctant to change that behavior and break old code. – Chris Aug 19 '10 at 11:04
0

strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.

puts() writes the string until it reaches '\0'.

My guess is that malloc(0) only returns NULL and not cause a run-time error.

zooropa
  • 3,929
  • 8
  • 39
  • 61
0

My answer is in responce to Why does printf not seg fault or produce garbage?

From

The C programming language by Denis Ritchie & Kernighan

 typedef long Align;    /* for alignment to long boundary */
   union header {         /* block header */
       struct {
           union header *ptr; /* next block if on free list */
           unsigned size;     /* size of this block */
       } s;
       Align x;           /* force alignment of blocks */
   };
   typedef union header Header;

The Align field is never used;it just forces each header to be aligned on a worst-case boundary. In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains one more unit, for the header itself, and this is the value recorded in the size field of the header. The pointer returned by malloc points at the free space, not at the header itself.

The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.

   -----------------------------------------
   |        |     SIZE     |               |
   -----------------------------------------
     |        |
  points to   |-----address returned touser
   next free
   block
        -> a block returned by malloc 

In statement

char* test = malloc(1);

malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below

 --------------------------------------------------------------
| free memory  | memory in size allocated for user |           |
----------------------------------------------------------------
                                                              0x100(assume address returned by malloc)
                                                              test

So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.

char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
Achal
  • 11,821
  • 2
  • 15
  • 37
-1

If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Duck Ling
  • 1,577
  • 13
  • 20