Why does it work if the size of buffer is fewer than nbyte?

Question

The codes are like these:

#define BUFSIZ 5
#include <stdio.h>
#include <sys/syscall.h>

main()
{
    char buf[BUFSIZ];
    int n;
    n = read(0, buf, 10);
    printf("%d",n);
    printf("%s",buf);
    return 0;
}

I inputabcdefg then and the output is:

8abcdefg

In the read(0, buf, 10);, the 10 is larger than 5, which is the size of buf. But it doesn't seem to lead to a wrong result.. Does anyone have ideas about this? Thanks!

You rented a 5 foot deep container and then you jammed an 8 foot long crate in it, busting the container wall into another hidden container behind yours. Luckily the other container was empty and unused in your case. — nos, Nov 15 '13 at 09:58

Gian · Accepted Answer · 2013-11-15T08:09:28.353

1

This is a quirk of how allocation in C works. You have a buffer allocated on the stack, which is really just a chunk of contiguous memory that you can read and write. The fact that you're allowed to write off the end of this array means that in this case it just so happens to work. Perhaps on your machine with your particular compiler and stack layout, you don't end up overwriting anything important :-)

Relying on this behavior being the same between compiler versions is not advised.

edited Nov 15 '13 at 08:09

answered Nov 15 '13 at 08:04

Gian

13,735
44
51

1

same with the heap. can write whatever you want wherever you want – amdixon Nov 15 '13 at 08:05
You don't even need a new compiler version to make this fail. Just inserting a function call between `read` and `printf` can cause the behavior to change. – Fred Foo Nov 15 '13 at 09:59

Damon · Answer 2 · 2013-11-15T10:35:46.233

You can in principle¹ read from and write to any address, but it is only safe and meaningful to access data in an organized, well-defined manner.

The purpose of memory allocation (explicit or implicit) is to bring order into chaos. When you declare your buf array, a small block of memory is reserved on the stack.
Usually, allocations have a certain alignment (and sometimes a certain minimum size, also the operating system can only detect wrong accesses on a very coarse level), so there will often be small gaps in between your allocated memory blocks and small areas that you can write to and read from, seemingly without "anything bad" happening -- but you should pretend that this isn't the case, and you should not even think about using these implementation details to your advantage.

Your code example "works" because you were unlucky enough not to hit an unallocated or write-protected memory page, and you didn't overwrite another vital stack value that would have caused the application to crash (such as the function's return address).
I am purposely saying "unlucky", not "lucky" as the fact that it appears to work is not a good thing. It's incorrect code², and such code should crash early, so you can detect and fix the problem. It may otherwise lead to very hard to diagnose problems that appear to occur at an entirely unrelated time or location. Even if it works now, you have no guarantee whatsoever that it will work tomorrow (or, on a different computer, or with a different compiler, or with ever so slightly different code).

Memory allocation is generally a three-step process. It is an allocation request to the operating system done by the C library (which usually does not directly correspond to your requests) followed by some bookkeeping done in the library, and a promise made by you. At the operating system level, the actual physical allocation on a page level happens on demand as you access memory for the first time, supposed that the C library has requested allocation for the accessed location earlier.
In the case of stack allocation, the process is somewhat easier on the library level, since it really only has to decrement one special register, but this is mostly irrelevant for you. The concept remains the same.

The promise you make is that you will only ever read from or write to the agreed area, and this is the primary thing that is important for you.

It can happen that you break your promise (deliberately or by accident) and it still "works", but that is pure coincidence.
On the stack, you will sooner or later overwrite either the store of some local variables (which may go undetected if they're cached in a register) and finally the return addresses, which will almost certainly cause a crash (or similar undesired behavior) when the function returns. On the heap, you may overwrite some other program data or access a page that hasn't been communicated to the operating system as being reserved. In that case, the program will be terminated immediately.

¹ Let's not consider virtual memory and page protections for an instant.
² Strictly speaking, it's not incorrect code, but code that invokes undefined behavior. However, overwriting unallocated memory is in my opinion serious enough to merit the label "incorrect".

Why does it work if the size of buffer is fewer than nbyte?

2 Answers2