1

For below sample code;

char* g_commands[]={
    "abcd",
    NULL
};

int main()
{
    
    g_commands[1] = "efg";
    
    char ** tmp = &g_commands[0];
    
    for(; *tmp != NULL; tmp++){
        printf("%s", *tmp);
    }

    return 0;
}

since tmp is pointing to the pointers in g_commands array in a loop, after I assign "efg" to g_commands[1], I expect the loop create a segmentation fault since the last element of g_commands is not null anymore. But the program finishes without exception and prints abcdefg successfully.

Why is it so? Does the compiler add NULL to the end of char* array as well?

Ugur KAYA
  • 167
  • 3
  • 14
  • 4
    No, it doesn't. You just happened to get lucky in this case. It's undefined behaviour. – Thomas Jager Aug 13 '20 at 12:31
  • 1
    You are noticing undefined behavior that happens to allow for "abcdefg" in your environment, but it won't happen across platforms. What is undefined behavior? It basically means what you're doing is outside of the C standard, so different compiler implementations may handle it differently. It's not defined by C itself. – h0r53 Aug 13 '20 at 12:32

3 Answers3

3

I expect the loop create a segmentation fault since the last element of g_commands is not null anymore. But the program finishes without exception and prints abcdefg successfully.

Why is it so? Does the compiler add NULL to the end of char* array as well?

You invoke undefined behavior as you dereference the pointer to pointer tmp pointing past the end of the array and attempt to printing an indeterminate string with printf("%s", *tmp).

Undefined behavior does not need to provide wrong results. It is a misconception to think that things be right when they appear to be right.

You can't expect anything. It also makes no big sense to explain the reasons and ways of undefined behavior as it is plain irrelevant for you writing production code.

I know some people which like to investigate these and seeing the implementation's behavior, but generally seen these aren't things to focus on deeper if you're interested about writing insusceptible, portable and reliable code.

2

The program has undefined behavior. In particular it means that a program can produce as an expected or as unexpected result.

I expect the loop create a segmentation fault since the last element of g_commands is not null anymore

The program works without a segmentation fault because the array g_commands

char* g_commands[]={
    "abcd",
    NULL
};

is defined in the global namespace and there is no other definition of an object after the array. Such a declaration has static storage duration and compilers usually set this memory to zeroes.

If you will move the definition in main like

#include <stdio.h>
/*
char* g_commands[]={
    "abcd",
    NULL
};
*/
int main()
{
    char* g_commands[]={
        "abcd",
        NULL
    };
    
    g_commands[1] = "efg";
    
    char ** tmp = &g_commands[0];
    
    for(; *tmp != NULL; tmp++){
        printf("%s", *tmp);
    }

    return 0;
}

then the probability that a segmentation fault will occur is very high.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • "global namespace ... compilers set this memory to zeroes." Is this always true? I would imagine `g_commands` would be in `.data` while the literals themselves would be in `.rodata`. I know that `.bss` is zero initialized, but I haven't heard of `.data` being zero initialized, as clearly some other global data may include non-zero values. Perhaps adding more globals would increase the likelihood of a segfault. – h0r53 Aug 13 '20 at 12:55
  • 1
    @h0r53 The segments can follow each other. Moreover there can be alignment for example to the size of the paragraph – Vlad from Moscow Aug 13 '20 at 12:59
  • 1
    That makes sense. So in principle, declaring additional (not NULL) global variables would increase the likelihood of a segfault in this example, right? – h0r53 Aug 13 '20 at 13:01
  • 2
    @h0r53 I think so. We need to see the generated object module. – Vlad from Moscow Aug 13 '20 at 13:02
  • 1
    @VladfromMoscow added bunch of global variables and voila, core dump. – Ugur KAYA Aug 13 '20 at 13:13
  • 1
    I prefer this answer over the others as it delves into the mystery of what "Undefined" means and why the code *may* still work in certain circumstances. It's easy to suggest that something is undefined, don't trust the output, and that's that. But digging deeper into what is really going on provides a useful perspective beyond that of simply undefined. – h0r53 Aug 13 '20 at 13:18
  • @h0r53 yes, that is my opinion, too. – Ugur KAYA Aug 14 '20 at 12:34
0

Let's go through it step by step.

char* g_commands[]={
    "abcd",
    NULL
};

int main()
{
    
    g_commands[1] = "efg";

At this point g_commands was altered as if you'd initilaized it in the following way:

// char* g_commands[]={
//    "abcd",
//    "efg"
// };

Note, that there's no longer a terminating null pointer in g_commands from this point on.

The following

    char ** tmp = &g_commands[0];

could have as well been written as

// char ** tmp = g_commands;

Now when you iterate over the elements of g_commands, you're testing for tmp dereferencing to a null pointer. Unfortunately you did overwrite the last element of g_commands with a non-null pointer previously, so this

    for(; *tmp != NULL; tmp++){
        printf("%s", *tmp);
    }

is running beyond the bounds of the array and invoking undefined behavior.

    return 0;
}
datenwolf
  • 159,371
  • 13
  • 185
  • 298