1

I have a question about this code below:

#include <stdio.h>

char abcd(char array[]);

int main(void)
{
    char array[4] = { 'a', 'b', 'c', 'd' };

    printf("%c\n", abcd(array));

    return 0;
}

char abcd(char array[])
{
    char *p = array;

    while (*p) {
        putchar(*p);
        p++;
    }
    putchar(*p);
    putchar(p[4]);
    
    return *p;
}

Why isn't segmentation fault generated when this program comes across putchar(*p) right after exiting while loop? I think that after *p went beyond the array[3] there is supposed to be no value assigned to other memory locations. For example, trying to access p[4] would be illegal because it would be out of the bound, I thought. On the contrary, this program runs with no errors. Is this because any other memories which no value are assigned (in this case any other memories than array[4]) should be null, whose value is '\0'?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Fary
  • 77
  • 4
  • 9
    Because undefined behavior is undefined. It might segfault or it might not. – tkausl Jul 02 '22 at 05:04
  • "right after exiting while loop?" --> the loop does not certainly exit. – chux - Reinstate Monica Jul 02 '22 at 05:20
  • 4
    "trying to access p[4] would be illegal because it would be out of the bound," --> no it is not _illegal_, it is _undefined_. Anything may happen. – chux - Reinstate Monica Jul 02 '22 at 05:22
  • 1
    You're right, trying to access `p[4]` is undefined. But there's another problem: your loop `while(*p)` doesn't do what you think it does. When you declared your original `array[4]`, there's no guarantee it has a 0 after it. Your loop could very easily try to print a bunch of garbage characters that it found past the end of `array`. (And, yes, it would be engaging in undefined behavior for each extra value that it fetched.) Your loop would be okay *if* you had declared `char array[5] = { 'a', 'b', 'c', 'd' };` or `char array[] = "abcd";`, but not as it is. – Steve Summit Jul 02 '22 at 12:39
  • 1
    Related: [1](https://stackoverflow.com/questions/9137157) [2](https://stackoverflow.com/questions/6452959) [3](https://stackoverflow.com/questions/1239938) [4](https://stackoverflow.com/questions/11551472) [5](https://stackoverflow.com/questions/15646973) [6](https://stackoverflow.com/questions/55692816) [7](https://stackoverflow.com/questions/12410016) [8](https://stackoverflow.com/questions/57247807) [9](https://stackoverflow.com/questions/57930992) – Steve Summit Jul 02 '22 at 12:45

4 Answers4

4

OP seems to think accessing an array out-of-bounds, something special should happen.

Accessing outside array bounds is undefined behavior (UB). Anything may happen.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
3

Let's clarify what a undefined behavior is.

The C standard is a contract between the developer and the compiler as to what the code means. However, it just so happens that you can write things that are just outside what is defined by the standard.

One of the most common cases is trying to do out-of-bounds access. Other languages say that this should result in an exception or another error. C does not. An argument is that it would imply adding costly checks at every array access.

The compiler does not know that what you are writing is undefined behavior¹. Instead, the compiler assumes that what you write contains no undefined behavior, and translate your code to assembly accordingly.

If you want an example, compile the code below with or without optimizations:

#include <stdio.h>

int table[4] = {0, 0, 0, 0};

int exists_in_table(int v)
{
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) {
            return 1;
        }
    }
    return 0;
}

int main(void) {
    printf("%d\n", exists_in_table(3));
}

Without optimizations, the assembly I get from gcc does what you might expect: it just goes too far in the memory, which might cause a segfault if the array is allocated right before a page boundary.

With optimizations, however, the compiler looks at your code and notices that it cannot exit the loop (otherwise, it would try to access table[4], which cannot be), so the function exists_in_table necessarily returns 1. And we get the following, valid, implementation:

exists_in_table(int):
        mov     eax, 1
        ret

Undefined behavior means undefined. They are very tricky to detect since they can be virtually invisible after compiling. You need advanced static analyzer to interpret the C source code and understand whether what it does can be undefined behavior.

¹ in the general case, that is; modern compilers use some basic static analyzer to detect the most common errors

qsantos
  • 1,723
  • 1
  • 12
  • 13
  • What a blatant example of toxic UB exploitation by optimizing compilers. Compilers that take advantage of UB this way should output a proper diagnostic: `potential undefined behavior interpreted as intended`, just like some expressions involving operators with error prone precedence levels are flagged as potential programmers errors. – chqrlie Jul 02 '22 at 11:06
  • 1
    I think you underestimate how often this assumption is used by compilers. Such a diagnostic would happen for any non-trivial program. As I said in the footnote, compilers typically include basic static analyzers to find common mistakes. However, proving the absence of UBs in a non-trivial program is much harder than pattern matching. Doing it without heavy user intervention is currently an open problem. In practice, your compiler cannot tell you that your code has no undefined behavior. A whole language, Rust was created to make it possible to eliminate _some_ undefined behaviors. – qsantos Jul 02 '22 at 12:25
  • you are probably right, and I have enough background to understand how this problem cannot be solved in most cases, but for the compiler to assume that `return 0;` cannot be reached in your example relies on UB detection. This should be reported IMHO. – chqrlie Jul 02 '22 at 13:16
  • You are using human reasoning to infer that this is a mistake from the programmer. For all the compiler knows, it could just be part of the function contract that v must be a value of table. For instance, it's pretty common to have functions that assume that the pointer they are given is valid, that's part of an implicit contract that the compiler cannot guess. – qsantos Jul 03 '22 at 05:17
0

C does no bounds checking on array accesses; because of how arrays and array subscripting are implemented, it can't do any bounds checking. It simply doesn't know that you've run past the end of the array. The operating environment will throw a runtime error if you cross a page boundary, but up until that point you can read or clobber any memory following the end of the array.

The behavior on subscripting past the end of the array is undefined - the language definition does not require the compiler or the operating environment to handle it any particular way. You may get a segfault, you may get corrupted data, you may clobber a frame pointer or return instruction address and put your code in a bad state, or it may work exactly as expected.

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • *it can't do any bounds checking* Well, it can if you implement full-blown [smart pointers](https://en.wikipedia.org/wiki/Smart_pointer), but of course there's some overhead... – Steve Summit Jul 02 '22 at 13:29
0

There are few remark points inside your program:

  • array inside the main and abcd function are different. In main, it is array of 4 elements, in abcd, it is an input variable with array type. If inside main, you call something like array[4] there will be compiler warnings for this. But there won't be compiler warning if you call in side abcd.
  • *p is a pointer point to array or in other word, it point to first element of array. In C, there isn't any boundary or limit for p. Your program is lucky because the memory after array contains 0 value to stop the while(*p) loop. If you did check the address of pointer p (&p). It might not equal to array[4].
ThongDT
  • 162
  • 7