2

I'm a Java programmer struggling to pick up C. In particular, I am struggling to understand strcat(). If I call:

strcat(dst, src);

I get that strcat() will modify my dst String. But shouldn't it leave the src String alone? Consider the below code:

#include<stdio.h>
#include<string.h>

void printStuff(char* a, char* b){
        printf("----------------------------------------------\n");
        printf("src: (%d chars)\t\"%s\"\n",strlen(a),a);
        printf("dst: (%d chars)\t\"%s\"\n",strlen(b),b);
        printf("----------------------------------------------\n");
}

int main()
{
        char src[25], dst[25];
        strcpy(src,  "This is source123");
        strcpy(dst,  "This is destination");

        printStuff(src, dst);
        strcat(dst, src);
        printStuff(src, dst);

        return 0;
}

Which produces this output on my Linux box, compiling with GCC:

----------------------------------------------
src: (17 chars) "This is source123"
dst: (19 chars) "This is destination"
----------------------------------------------
----------------------------------------------
src: (4 chars)  "e123"
dst: (36 chars) "This is destinationThis is source123"
----------------------------------------------

I'm assuming that the full "This is source123" String is still in memory and strcat() has advanced the char* src pointer forward 13 chars. But why? Why 13 chars? I've played around with the length of the dst string, and it definitely has an impact on the src pointer after strcat() is done. But I don't understand why...

Also... how would you debug this, in GDB, say? I tried "step" to step into the strcat() function, but I guess that function wasn't analyzed by the debugger; "step" did nothing.

Thanks! -ROA

PS - One quick note, I did read through similar strcat() posts on this site, but didn't see one that seemed to directly apply to my question. Apologies if I missed the post which did.

Pete
  • 1,511
  • 2
  • 26
  • 49
  • 1
    "I'm assuming that the full "This is source123" String is still in memory " --> maybe. Once code plays outside its sandbox (write outside array bounds), anything may happen - undefined behavior (UB). Do not expect "But shouldn't it leave the src String alone?" The results and explanations may make sense today,but the results tomorrow may differ. – chux - Reinstate Monica Sep 15 '16 at 20:43
  • 2
    You only allocated a `char[25]` for `dst`. How do you expect to fit 37 chars in there (including a trailing null)? – user2357112 Sep 15 '16 at 20:43
  • 2
    `src` + `dst` concatenated have more than the 24 + 1 terminator char you have allocated with `dst[25]` – Weather Vane Sep 15 '16 at 20:43
  • If you overflow or go outside of your allocated arrays by just one char, you're in the anything-can-happen undefined behavior land. – Petr Skocik Sep 15 '16 at 20:48
  • 3
    Note: the fact that `src` was overwritten is a coincidence, since it just happened to be allocated in memory (on the stack) at an address that is just after where `dst` is allocated. As a result continuing past the end of `dst` will overwrite `src`. If you switch the order of your declarations then the result may be different: you may overwrite the stack frame instead, including the return address for the function and saved registers. – dsh Sep 15 '16 at 21:00
  • " strcat() has advanced the char* src pointer" - no, strcat does not advance any pointer. It copies characters around. – M.M Sep 15 '16 at 21:23
  • @WeatherVane Ahhhhhhh... That makes a WORLD of sense. I upped the size of src & dst to 50 bytes, and the issue vanished. Lesson learned... Thanks!!! – Pete Sep 16 '16 at 17:57
  • 1
    @M.M Hmmm... "It copies characters around." Huh. I'll have to think about that. Not sure how that impacts my src string in this case. But thanks! – Pete Sep 16 '16 at 17:59
  • I understand that this is undefined behavior but what exactly is changing the source string? – Lokesh Oct 05 '17 at 13:26

2 Answers2

5

Your destination doesn't have enough memory allocated to hold the new concatenated string. In this case this means that src is probably being overwritten by strcat due to it writing beyond the bounds of dst.

Allocate enough memory for dst and it should work without it overwriting the source string. Note that the new memory segment that holds the concatenated strings needs to be at least the size of the two strings(in your case 36) plus space for the null terminator.

rfreytag
  • 1,203
  • 11
  • 18
  • 4
    worth mentioning that this sort of bug is a very common security issue, and that [`strncat`](http://en.cppreference.com/w/c/string/byte/strncat) is itself... a security issue as it doesn't behave as people expect. – Mgetz Sep 15 '16 at 20:48
  • 1
    @Mgetz I'd say strncat behaves as one would expect. Are you confusing it with strncpy maybe? – hyde Sep 15 '16 at 21:02
  • 1
    @hyde, I'd say that people are prone to expect that the length parameter passed to `strncat()` represents the total size of the destination buffer, or maybe that less one, which would have been the sensible design. That it instead represents an upper bound on the number of characters to transfer is not only surprising but also harder to use. – John Bollinger Sep 15 '16 at 21:26
  • 1
    @hyde: what is the length argument to `strncat()`? If you were about to say "the length of the target buffer", then that's why `strncat()` is an horrendous interface and liable to cause trouble. If you know enough to be able to use `strncat()` safely, you could use `strcpy()` or `memmove()` or `memcpy()` safely instead, and more efficiently. – Jonathan Leffler Sep 16 '16 at 00:24
  • @JonathanLeffler :-D OMG. Good thing I always use *snprintf* for string concatenation in C, and can't think of any code I'd have to go and fix now... At least the use as replacement for *strncpy* works as expected. – hyde Sep 16 '16 at 04:17
  • 2
    @hyde: Yes, the interface to [`strncat()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/strncat.html) is eccentric. However, as you say, if the target string is empty and you use `strncat(was_empty_target_string, source_string, sizeof(was_empty_target_string));` then `strncat()` does work as expected. Any other scenario and it does not. – Jonathan Leffler Sep 16 '16 at 04:22
  • @hyde no I meant what I said, `strncat` has weird semantics, to the point that both MS and the BSD folks rejected it for [`strcat_s`](https://msdn.microsoft.com/en-us/library/td1esda9.aspx) and [`strlcat`](http://man.openbsd.org/cgi-bin/man.cgi?query=strlcat) respectively; on the basis that while `strncat` can be used safely it is not in the majority of cases because most programmers don't bother to understand its semantics. – Mgetz Sep 16 '16 at 11:26
  • @Mgetz Yeah... I learned something new today, see my comment above. – hyde Sep 16 '16 at 14:09
1

Yes, I'm sure everything to do with manual memory management comes with some difficulty if your background is strictly Java.

With regard to anything related to C strings, it will probably be useful to put everything you know about Java Strings out of your head. The closest Java analogs of C strings are char[] and byte[]. Even there you can get in trouble, however, because Java performs bounds-checking for you, but C does not. In fact, C allows you to do all manner of things that you oughtn't to do, all the while standing back and quietly murmuring, "who knows what will happen if you do that?".

In particular, when you call strcat() or any other function that writes into a char array, you are responsible for ensuring that there is enough space in the destination array to accommodate the characters. If there isn't, then the resulting behavior is undefined (who knows what will happen?). You exercised just such undefined behavior.

Generally speaking, you need to do one or more of these things:

  • Have a hard upper bound on the size that could be needed, and allocate at least that much space, or
  • Know how much space you have, and work within that space (e.g. truncate any excess), or
  • Track how much space you have and how much space you need, and allocate more space as needed (being sure to later free all dynamically allocated space when you no longer need it).
John Bollinger
  • 160,171
  • 8
  • 81
  • 157