C - strncpy segfaults when using pointer

Question

I've got the following piece of code:

#include<stdio.h>
#include <string.h>

int main(void) {
    char *src = "This is my string.";
    char *dest, *ret;
    //char dest[64], *ret;
    ret = strncpy(dest, src, 5);
    size_t s = strlen(ret);

    printf("src: %s\n", src);
    printf("dst: %s|\n", dest);
    printf("ret: %s|\n", ret);
    printf("len: %d\n", s);

    //for (int i = 0; i < 5; i++) {
    //    printf("i: %d\n", i);
    //}

    return 0;
}

for loop disabled

$ gcc -g -o test test.c; ./test 
src: This is my string.
dst: This |
ret: This |
len: 5

for loop enabled

$ gcc -g -o test test.c; ./test 
Segmentation fault (core dumped)

I wonder why this is failing only when the for loop is enabled.

Is this just an undefined behavior because I'm using a dangling pointer for the dest argument or there is another explanation for this?

By looking at the gdb session, it crashed when trying to assign a value from ecx to rdi register?

(gdb) bt
#0  0x00007ffff7f4a1a7 in __strncpy_avx2 () from /lib64/libc.so.6
#1  0x000000000040116e in main () at stack.c:8
(gdb) x/i 0x00007ffff7f4a1a7
=> 0x7ffff7f4a1a7 <__strncpy_avx2+1591>:    mov    DWORD PTR [rdi],ecx
(gdb) x/i $rdi
0x401060 <_start>:  endbr64
(gdb) p $rdi
$7 = 4198496
(gdb) p $ecx
$8 = 1936287828

This is undefined behaviour. You're using an uninitialized pointer. `strncpy` dereferences this pointer, which could have any value, and tries to write to that memory. You need to have a destination buffer, consisting of valid memory, and pass a pointer to that. — Thomas Jager, Aug 07 '19 at 14:20
@ThomasJager - thanks for the comment, any idea why this happens only when the `for` loop is enabled? — HTF, Aug 07 '19 at 14:24
For me even in the case when for loop is commented it is giving segmentation fault. — Yogesh Chuahan, Aug 07 '19 at 14:28
@HTF It's undefined behaviour. Anything can happen, or nothing bad might happen. You can't predict it. What you are doing is likely messing with other arbitrary variables. The scope of the effects of undefined behaviour is outside of just where you do the behaviour. As soon as you have any in your program, the entire program's behaviour is undefined, and you can no longer predict what will happen. — Thomas Jager, Aug 07 '19 at 14:28
Also note that *strncpy* isn't really a string function, in the sense that it doesn't necessarily produce a NUL-terminated C string in destination. A good rule of thumb is "never use *strncpy*", because 99% of the use cases, it doesn't do what you want. If you have that 1%, you would know. — hyde, Aug 07 '19 at 14:50
*Any idea why this happens only when the for loop is enabled?* This is basically like asking, "Yesterday, I drove my car through a red light, and nothing happened. Today I had the radio on, and when I drove through the red light, a truck crashed into me. Any idea why this happens only when I have the radio on?" — Steve Summit, Aug 07 '19 at 14:53
As per the comment by @VladfromMoscow, converting to this char dest[5], *ret; solved the problem for me. — Yogesh Chuahan, Aug 07 '19 at 15:05
The most probable reason is that `printf()` usually involves allocating memory (into which it builds the output string)... the more you interact with memory, the more likely you are to encounter the effects of undefined behaviour. — TripeHound, Aug 07 '19 at 15:19
@YogeshChuahan changing to `char dest[5], *ret;` still leads to undefined behaviour because you go on to use `strlen` on a non-null-terminated string — M.M, Aug 07 '19 at 22:38
Be very careful with `strncpy`, or preferably avoid it altogether. https://the-flat-trantor-society.blogspot.com/2012/03/no-strncpy-is-not-safer-strcpy.html — Keith Thompson, Aug 08 '19 at 00:15
Reopened: writing through an uninitialized pointer is a substantially different case to writing to a string literal — M.M, Aug 08 '19 at 01:33

nickelpro · Accepted Answer · 2019-08-08T00:10:51.773

The answer per the spec that you're going to hear from most people is something like this: The program crashes because you're invoking UB by writing to an uninitialized pointer. At this point, crashing is a valid behavior, so sometimes it crashes and sometimes it does something else which is also valid (because UB).

This is correct-ish, but it doesn't answer your question. Your question was, "Why doesn't it crash in all circumstances?" In your case, you only achieved a segfault when you changed the structure of your program to include a for loop that seems to perform unrelated behavior. For this we need a basic introduction to program memory layout and the nature of segfaults, we'll start with segfaults.

Segmentation Faults and Virtual Memory

A segmentation fault is a somewhat complex beast under the hood if you're unfamiliar with CPU architecture. Its purpose is simple enough, if an executing process tries to access memory that it shouldn't, a segfault should be issued. The devil in the details being, what defines "memory the process shouldn't touch"? And how should the segfault be communicated to the operating system?

On modern operating systems and CPU architectures, a process' valid memory space is controlled using a virtual memory system. The operation of virtual memory is outside the scope of your question, but suffice to say both the operating system and the CPU itself are aware of what addresses your process can and cannot access. If your process strays outside the bounds of its allowed memory space, a segfault will be issued.

To "issue" a segfault the CPU will synchronously interrupt your program, and alert the operating system you've done a naughty thing. These are also called "exceptions" or "traps", but they're all just different nomenclature for "your program asked the CPU to do something that it can't or won't do". The operating system handles the interrupt, and then issues the signal (*Nix) or exception (Win32) to your program. If your program hasn't set up a handler for that signal/exception, the OS gracefully crashes you.

An interesting oolie about virtual memory is that it is generally only issued in packages of 2^12 continuous bytes (4KiB). So even if your process only wants, say, 10 bytes it's going to get handed at least 4KiB. This continuous grouping of bytes is called a "page" because it groups "lines" of memory.

Program Memory and the Stack

Even if your process never asks for memory using malloc or its ilk, its going to get handed a couple pages in order to implement what's called the stack (which lends its name to certain websites). This is where your locally declared variables like src, dest, ret, and s live. It's also used to spill non-volatile CPU registers when moving between function calls, but that is also outside the scope.

So, if dest is just a piece of memory on the stack, and is never initialized in your program, what's it pointing to? Well, whatever random data happens to exist at that memory address is now your pointer. Your program's operation is now at the whim of garbage bytes from the stack page.

Conclusion

If the garbage in the stack space happens to point somewhere inside one of the memory pages that was issued to your process for stack space, your process won't access invalid memory and will keep on chugging (or it points somewhere nearby, Linux can automatically grow the stack if you're within one page of the last valid page). However, if it points anywhere else, you cause an invalid memory access and the CPU alerts the relevant authorities. Your process is a criminal and will be treated accordingly.

"But nickelpro," you intercede, "what does any of that have to do with the for loop?" Nothing, the for loop is a red herring. In this case it happens to be biasing the stack allocation into a place where the garbage happens to cause a segfault. That could be related to many things, possibly as a consequence of ASLR or just random happenstance. Someone who knows more than me about virtual memory implementations could shine a light on this.

Errata

Now your program's structure also has a (I think) unintended bug in it which is exasperating the problem. You perform the initial string copy with:

ret = strncpy(dest, src, 5);

Which does not null-terminate the destination string, which means when you call:

size_t s = strlen(ret);

strlen is going to keep reading until it hits a null byte. So even if dest happened to point somewhere valid, bad luck with the memory garbage will cause strlen to read its way into invalid memory.

Worth a reference to [Undefined, unspecified and implementation-defined behavior](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior) — David C. Rankin, Aug 08 '19 at 01:13
While the `for` loop itself is almost certainly a red herring, the `printf()` inside it almost certainly isn't. A call to `printf()` almost always involves allocation of memory, and so very often acts as a "detector" of memory misuse earlier in the code (most commonly because the "misuse" has corrupted the library's internal "housekeeping data" around blocks of allocated and free heap space). — TripeHound, Aug 08 '19 at 06:39
@TripeHound Sure but the crash in OP is happening in the original `strncpy`. Any memory allocation happening in either of version the program, `for` loop or not, doesn't get involved. The only driver of the crash is that original `dest` dereference. — nickelpro, Aug 08 '19 at 11:05

C - strncpy segfaults when using pointer

1 Answers1

Segmentation Faults and Virtual Memory

Program Memory and the Stack

Conclusion