2

The following program

// Code has taken from http://ideone.com/AXClWb
#include <stdio.h>
#include <string.h>

#define SIZE1 5
#define SIZE2 10
#define SIZE3 15

int main(void){
    char a[SIZE1] = "Hello";
    char b[SIZE2] = " World";
    char res[SIZE3] = {0};

    for (int i=0 ; i<SIZE1 ; i++){
        res[i] = a[i];
    }

    strcat(res, b);
    printf("The new string is:  %s\n",res);
    return 0;
}

has well defined behavior. As per the requirement, source string b is null terminated. But what would be the behavior if the line

char res[SIZE3] = {0};  // Destination string

is replaced with

char res[SIZE3];  

Does standard says explicitly about the destination string to be null terminated too?

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 2
    I think `strcat` start searching for the `null terminator` into the `res` string before to concatenate the string, so it is UB. – LPs Aug 11 '16 at 09:53
  • @LPs What exactly is UB? Doesn't this line `char res[SIZE3] = {0};` initialize `res` to `15` ZERO's? – Michi Aug 11 '16 at 10:24
  • @Michi Read another look at the question: OP wants to know if `char res[SIZE3];` instead. – LPs Aug 11 '16 at 10:29
  • @LPs I noticed to late that. This was my [CODE](http://stackoverflow.com/questions/38891031/how-is-an-array-stored-in-memory-in-this-program/38891207?noredirect=1#comment65144649_38891207) in another Question. – Michi Aug 11 '16 at 10:32
  • `UB` = Undefined Behaviour. – Koshinae Aug 11 '16 at 10:32
  • @haccks that was the reason why that `for` is present there. [If you remove](http://ideone.com/vCUCW8) that `for` and you will replace it with `strcpy`, even if you say `char res[SIZE3] = {0};` is still UB, because `strcpy` search for that `'\0'` too and `char a[SIZE1] = "Hello";` doesn't have it. This is the major problem when dealing with `strcpy` and `strcat` many they don't really know when should use `strcpy` or `strcat` .... or both. – Michi Aug 11 '16 at 10:49
  • The way `strcat` works is that it iterates through `dest` string characters until it reaches the null character in `dest`, and then appends `src` characters until it reaches the null character in `src`. It depends on `dest` being null terminated. Concatenating multiple strings is therefore relatively expensive, i.e. the function always starts from the beginning of the destination string. It would be cool if `strcat` returned the pointer to the last (null) character to avoid this, but unfortunately it just returns `dest` again. – vgru Aug 11 '16 at 10:57
  • 1
    @Groo: I find myself hard-pressed to think of many use cases for `strcat` as defined. A variation of strcpy that accepted "start" and "one-past" pointers for the destination buffer and returned the either the location of the trailing zero (if written) or the one-past pointer of the destination (if full) would seem much more useful. – supercat Aug 11 '16 at 18:53

4 Answers4

3

I think man explicitly says that

Description

The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs.

Enphasis mine

BTW I think strcat starts searching for the null terminator into the dest string before to concatenate the new string, so it is obviously UB, as far as dest string has automatic storage.

In the proposed code

for (int i=0 ; i<SIZE1 ; i++){
    res[i] = a[i];
}

Copy 5 chars of a and not the null terminator to res string, so other bytes from 5 to 14 are uninitialized.

Standard also says about safaer implementation strcat-s

K.3.7.2.1 The strcat_s function

Synopsis

    #define _ _STDC_WANT_LIB_EXT1_ _ 1
      #include <string.h>
      errno_t strcat_s(char * restrict s1,
           rsize_t s1max,
           const char * restrict s2);

Runtime-constraints

2 Let m denote the value s1max - strnlen_s(s1, s1max) upon entry to strcat_s.

We can see that strlen_s always return them valid size for the dest buffer. From my point of view this implementation was introduced to avoid the UB of the question.

LPs
  • 16,045
  • 8
  • 30
  • 61
3

TL;DR Yes.


Since this is a language-lawyer question, let me add my two cents to it.

Quoting C11, chapter §7.24.3.1/2 (emphas is mine)

char *strcat(char * restrict s1,const char * restrict s2);

The strcat function appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1.[...]

and, by definition, a string is null-terminated, quoting §7.1.1/1

A string is a contiguous sequence of characters terminated by and including the first null character.

So, if the source char array is not null-terminated (i.e., not a string), strcat() may very well go beyond the bounds in search of the end which invokes undefined behavior.

As per your question, char res[SIZE3]; being an automatic local variable, will contain indeterminate value, and if used as the destination of strcat(), will invoke UB.

haccks
  • 104,019
  • 25
  • 176
  • 264
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
2

If you leave res uninitialized then, after the copying a into res (in for loop), there's no NUL terminator in res. So, the behaviour of strcat() is undefined if the destination string doesn't contain a NUL byte.

Basically strcat() requires both of its arguments to be strings (i.e. both must contain the terminating NUL byte). Otherwise, it's undefined behaviour. This is obvious from the description of strcat():

§7.23.3.2, strcat() function

The strcat function appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1.

(emphasis mine).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
P.P
  • 117,907
  • 20
  • 175
  • 238
  • `a` is not copied to `res`, its `b` and it is null terminated. – haccks Aug 11 '16 at 09:58
  • I was talking about the copying done in the for loop, where 5 bytes are copied into `res`, leaving it without the NUL terminator. Hence, the subsequent `strcat()` results in UB. – P.P Aug 11 '16 at 09:59
1

If char res[SIZE3]; is on the stack, it'll have random/undefined stuff in it.
You'll never know whether there'll be a zero byte within res[SIZE3], so yes, strcatting to that is undefined.

If char res[SIZE3]; is an uninitialized global, it'll be all zeros, which will make it behave as an empty c-string, and strcating to it will be safe (as long as SIZE3 is large enough for what you're appending).

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142