2

I want to concatenate two strings which are defined like this:

char hello[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char world[] = { ',', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\0' };

I understand that I should run over the first, find the '\0' sign and instead of it start the second string. Is function strcat working in the same way?

The code that I'm using:

for (int i = 0; i < 6; i++) {
    if (hello[i] == '\0') {
        for (int j = 0; j < 9; j++) {
            int index = 5 + j;
            hello[index] = world[j];
        }
    }
}

After compilation I get such an error:

* stack smashing detected *: ./run terminated

What am I doing wrong?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Viacheslav Kondratiuk
  • 8,493
  • 9
  • 49
  • 81

5 Answers5

14

My answer won't initially focus on concatenating a string correctly; but rather will attempt to address some issues in your code as it stands and give you some backdrop thoughts that may help clarify how to think about things in C. And then we'll look at concatenating the strings

Before we start, some thoughts on the structure of C-strings

Thinking in C is very much like thinking like a computer (CPU, memory, etc.); So for data types that work natively on CPU, C has characters (single byte things), shorts (double byte words), longs (4 byte words), ints, floats, and doubles, all things that a CPU natively understands. And the ability to create arrays of these things or pointers to memory locations where these types exist.

So how do we create a string then? Do we create a new type?

Well, since CPUs don't understand strings neither does C... Not in its most primitive form anyway (the C parser has no type associated with strings).

But strings are very useful so there had to be a reasonably simple notion of what a string should be had it be decided upon.

All a C-string is, is a bytes in sequential memory that don't include a the NUL char;

NUL (pronounced something like nool) is the name we give to value to a byte in memory that has value of 0. In C this is signified by \0. So if I write NUL it means character \0;

NOTE 1: This is different from the the C NULL which is a memory address of value zero;

NOTE 2: NUL of course is not character zero ('0') which has a value of 48;

So any function that works on strings starts a memory location pointed to by a char * (read char pointer); and just keeps on doing its operations byte (character) by byte (character) until it runs into a value of 0 for a byte indicating the end of string. At which time, hopefully it stops doing what it's doing because the string has ended and returns the results of its operations.

So if we make our strings be defined as an array of characters that ends in 0, and we completely avoid having to create any artificial concept of string beyond that.

And that's exactly what C does; it just settled on this notion as the convention to use; and the compiler just provides a simple shortcut to declare arrays of characters that are NUL terminated using the double quotes and that's it. There is no special type for strings in C.

So with all of this in mind let's look at your code:

char hello[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char world[] = { ',', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\0' };

You declared two arrays of single byte (char) and terminated them with \0; This is IDENTICAL to the following C statements:

char hello[] = "Hello";
char world[] = ", World!";

When compiled on a Linux machine running on a 64-bit Intel computer, both your pair and the one above emit the following (identical) machine code output:

Disassembly of section .data:
0000000000000000 <hello>:
   0:    48 65 6c 6c 6f 00                                   Hello.
0000000000000006 <world>:
   6:    2c 20 57 6f 72 6c 64 21 00                          , World!.

If you're using Linux you can try that out; let me know and I'll show you the commands as an addendum below.

Notice that in both cases a 00 byte appeared at the end. In your case it was explicitly declared by you in the array; and in the second case it was implicitly injected by the C compiler when emitting the data corresponding to the <hello> and <world> symbols.

Okay, so now that you understand how that works; you can see that:

// This is bad: :-)

for (int i = 0; i < 6; i++) {
    if (hello[i] == '\0') {
        for (int j = 0; j < 9; j++) {
            int index = 5 + j;
            hello[index] = world[j];
        }
    }
}

The looping above is very weird. Actually there are a bunch of things wrong with it (e.g. the loop nested inside the outer for loop is wrong);

But rather than pointing out the problems, let's just look at the basic correct solution.

When you program for strings you DON'T know how big they are; so the condition of the form i < N in the for loops dealing with strings is not the usual way to go.

Here is a way to loop through the characters in a string (a char array terminating with \0):

 char *p; /* Points to the characters in strings */
 char str[] = "Hello";

 for ( p = str; *p != 0; p++ ) {
     printf("%c\n", *p);
 }

So let's figure out what's happening here:

  for ( p = str; ...
        ^^^^^^^^^

p is a char pointer. In the beginning we point it to hello (which is where the variable hello is loaded in memory when you run the program) and check if the value at this memory location (obtained by *p) it's equal to '\0' or not:

  for (p = str; *p != 0; ...)
                   ^^^^^^^

If it's not we do our for loop because the condition is true; in our case *p=='H' so we enter the loop:

  for (p = str; *p != 0; p++)
                         ^^^

Here we now do our increment / decrement / something else first. But in this case the ++ operator is postfixed to p; so p (which is a memory address) will increment its value at the END of the statements in the loop; so now the loop enters the { ... } that does its thing and at the end the ++ happens and we enter the condition check again:

  for (p = str; *p != 0; p++)
                ^^^^^^^

So you can see that this would set p to point to memory locations for 'H' 'e' 'l' 'l' 'o' '\0'; and then it hits '\0' it will exit.

Concatenating strings:

So now that we know that we want to concatenate "Hello" and ", World!".

First we need to find the end of Hello and then we need to start sticking the ", World!" to the end of it:

Well we know that our for loop above finds the end of hello; so if we do nothing in it at the end of it, *p will point to where the '\0' at the end of Hello is:

char str1[] = "Hello";
char str2[] = ", World";

char *p; /* points str1 */
char *q; /* points str2 */


for (p = str1; *p!=0; p++) {
  /* Skip along till the end */
}
/* Here p points to '\0' in str1 */

/* Now we start to copy characters from str2 to str1 */
for (q = str2; *q != 0; p++, q++ ) {
   *p = *q;
}

Note that in the first pass *p was pointing to '\0' at the end of str1 so when we assign *p = *q that '\0' gets replaced by ','; and the '\0' disappears from str1 completely which we'll have to inject at the end; note that we still have to increment p and q at the end and continue looping while *q != 0.

Now that the loop ends we stick a '\0' at the end because we destroyed the one we had:

*p = 0;

And that is concatenation.

Important part about memory

If you notice in the assembler output above; Hello\0 took up six bytes and , World\0 started at address 0000000006 (hello started at 000000000) in the data segment.

Which means that if you write beyond the number of bytes of str1[] and it doesn't have enough space which is our case (why is explained below), we'll end up overwriting part of memory that belongs to something else (str2[]) for example;

The reason we don't have enough memory is because we just declared a character array that's big enough to hold our initialization value:

char str[] = "Foofoo";

will make str be exactly 7 bytes.

But we can ask C to give more space to str than just the initialization value. For example,

char str[20] = "Foofoo";

This will give str 20 bytes, and set the first seven to "Foofoo\0". The rest are typically set to \0 as well;

So the disassembling above would look like:

Disassembly of section .data:

0000000000000000 <str>:
   0:    48 65 6c 6c 6f 00 00 00 00 00 00 00 00 00 00 00     Foofoo..........
  10:    00 00 00 00                                         ....

Remember in C you have to think like a computer. If you don't explicitly ask for memory you won't have it. So if we are to do your concatenation either we have to use an array that's big enough because we explicitly declared it that way:

  char foo[1000]; /* Lots of room */

Or we ask for a memory location at run time using malloc (a topic for another post).

Let's just look at a working solution then:

concat.c:

#include <stdio.h>

char str1[100] = "Hello";
char str2[] = ", World!"; /* No need to make this big */

int main()
{
    char *p;
    char *q;

    printf("str1 (before concat): %s\n", str1);

    for (p = str1; *p != 0; p++) {
        /* Skip along to find the end */
    }

    for (q = str2; *q != 0; p++, q++ ) {
        *p = *q;
    }
    *p = 0; /* Set the last character to 0 */

    printf("str1 (after concat): %s\n", str1);

    return 0;
}

Disassembling on Linux:

If you compile the above into JUST an object file and don't link it to an executable, you'll keep things less messy:

  gcc -c concat.c -o concat.o

You can disassemble concat.o using object dump:

  objdump -d concat.o

You'll notice a LOT of unnecessary code in the dump dealing with the printf statements:

   0:    55                       push   %rbp
   1:    48 89 e5                 mov    %rsp,%rbp
   4:    48 83 ec 10              sub    $0x10,%rsp
   8:    be 00 00 00 00           mov    $0x0,%esi
   d:    bf 00 00 00 00           mov    $0x0,%edi
  12:    b8 00 00 00 00           mov    $0x0,%eax
  17:    e8 00 00 00 00           callq  1c <main+0x1c>

So to get rid of it just comment out the printf's in your code. Then recompile using the line

gcc -O3 -c concat.c  -o concat.o

again. Now you'll get a much cleaner output;

The -O3 removes some of the frame pointers (MUCH later subject) related instructions and the assembler will be specific to your code base:

Here is the concat.o output when compiled using above and dumped out using:

objdump -S -s concat.o


concat.o:     File format elf64-x86-64

Contents of section .text:
 0000 803d0000 000000b8 00000000 740b6690  .=..........t.f.
 0010 4883c001 80380075 f70fb615 00000000  H....8.u........
 0020 84d2741d b9000000 000f1f80 00000000  ..t.............
 0030 4883c101 88104883 c0010fb6 1184d275  H.....H........u
 0040 efc60000 31c0c3                      ....1..
Contents of section .data:
 0000 48656c6c 6f000000 00000000 00000000  Hello...........
 0010 00000000 00000000 00000000 00000000  ................
 0020 00000000 00000000 00000000 00000000  ................
 0030 00000000 00000000 00000000 00000000  ................
 0040 00000000 00000000 00000000 00000000  ................
 0050 00000000 00000000 00000000 00000000  ................
 0060 00000000 2c20576f 726c6421 00        ...., World!.
Contents of section .comment:
 0000 00474343 3a202844 65626961 6e20342e  .GCC: (Debian 4.
 0010 342e352d 38292034 2e342e35 00        4.5-8) 4.4.5.
Contents of section .eh_frame:
 0000 14000000 00000000 017a5200 01781001  .........zR..x..
 0010 1b0c0708 90010000 14000000 1c000000  ................
 0020 00000000 47000000 00000000 00000000  ....G...........

Disassembly of section .text:

0000000000000000 <main>:
   0:    80 3d 00 00 00 00 00     cmpb   $0x0,0x0(%rip)        # 7 <main+0x7>
   7:    b8 00 00 00 00           mov    $0x0,%eax
   c:    74 0b                    je     19 <main+0x19>
   e:    66 90                    xchg   %ax,%ax
  10:    48 83 c0 01              add    $0x1,%rax
  14:    80 38 00                 cmpb   $0x0,(%rax)
  17:    75 f7                    jne    10 <main+0x10>
  19:    0f b6 15 00 00 00 00     movzbl 0x0(%rip),%edx        # 20 <main+0x20>
  20:    84 d2                    test   %dl,%dl
  22:    74 1d                    je     41 <main+0x41>
  24:    b9 00 00 00 00           mov    $0x0,%ecx
  29:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
  30:    48 83 c1 01              add    $0x1,%rcx
  34:    88 10                    mov    %dl,(%rax)
  36:    48 83 c0 01              add    $0x1,%rax
  3a:    0f b6 11                 movzbl (%rcx),%edx
  3d:    84 d2                    test   %dl,%dl
  3f:    75 ef                    jne    30 <main+0x30>
  41:    c6 00 00                 movb   $0x0,(%rax)
  44:    31 c0                    xor    %eax,%eax
  46:    c3                       retq
Community
  • 1
  • 1
Ahmed Masud
  • 21,655
  • 3
  • 33
  • 58
  • Your answer is amazing. It's more than I expected. Great thank you. I wish you finish it :) And yes, I'm using Linux and interested to see machine representation of string. If you will specify commands, it will be very nice :) – Viacheslav Kondratiuk May 29 '13 at 08:16
  • the core is finished, i'll put in the commands for disassembling – Ahmed Masud May 29 '13 at 08:27
  • added a disassembling guide. I may end up using this for a C lecture :) – Ahmed Masud May 29 '13 at 08:44
  • 1. Have you ever heard of the "abstract machine"? It contradicts your idea that C is somehow bound to implement that which is native to *the CPU*. Which CPU has a `sizeof` operator? 2. Have you ever read [c-faq.com](http://c-faq.com/)? There's a question I think you might be interested in: [Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types?](http://c-faq.com/null/machexamp.html). 3. Are you aware that the value for `'0'` in EBCDIC is 250, not 48? C was rationalised to be portable. – autistic May 29 '13 at 15:37
  • 4. "*we'll end up overwriting part of memory that belongs to something else*" is invalid. I think you've misunderstood the concepts of "undefined behaviour" and "abstract machine". – autistic May 29 '13 at 15:40
  • @undefinedbehaviour I wrote this for a beginner, and gave him a solid anchor, if we cover all the permutations and nuances then there is a lot of subtlety missing in my answer that should be explained. I am not confused about differences between a real machine and abstract notions of a machine, but at this stage I needed my reader to anchor on to something tangible; Same as teaching arithmetic; We start by 3 apples + 3 apples = 6 apples and not groups and rings even though that's the correct abstraction. And yes :-) quite aware of different char encodings; – Ahmed Masud May 29 '13 at 16:09
  • Students don't need to know that `NULL` is 0 (invalid) or that `'0'` is 48 (also invalid); They need to understand that `NULL` points to *nothing*, that `'0'` is a character constant for the numeric digit zero and `'\0'` is a character constant for a null character... Why teach something unnecessarily complex and invalid, when it's less effort to teach portable C accurately? – autistic May 29 '13 at 16:32
  • actually NULL is guaranteed to be ((void *)0) in C ... and it's up to the compiler to interpret that to the correct machine equivalence. Now as for me being specific about the character '0' being different from '\0' I think that using ASCII as an example is perfectly valid. IMHO t my reply, within the context of the OP query, serves OP more than not. However, if you think that I have really botched up my answer, I invite you or edit my reply, or to post a more acceptable answer and then down-vote mine if need be. I am happy to learn as much as teach. – Ahmed Masud May 29 '13 at 19:41
  • That first statement is also invalid. `NULL` "expands to an implementation-defined null pointer constant" (section 7.19p3 of n1570.pdf, the C11 standard draft), which is *not* 0 but a pointer that points to *nothing*. 0 can be converted to a null pointer and back to an integer, but that doesn't make the pointer zero; Zero is a bit pattern of all zeros, which `NULL` isn't required to be. See the link all the way back up there. No. I'll let that be your choice to teach portable C or to teach non-portable non-C. You may want to consider re-evaluating your definition of *byte*, *short* and *long*. – autistic May 30 '13 at 01:20
  • Whether or not you make that choice is up to you... but consider the future: Those who support their claims with references are more likely to be kept on board at educational institutes, and in this case the C11 standard draft is if not the most significant, then a very important reference for deciding *what is C* and *what isn't*. – autistic May 30 '13 at 01:24
  • 2
    [Related question](http://stackoverflow.com/questions/5142251/redefining-null) about NULL versus null pointers. Also, the [com.lang C FAQ](http://c-faq.com/null/index.html) has a nice chapter on the topic. – Lundin May 30 '13 at 07:04
2

Only six bytes of memory will be allocated for hello. So try to create new memory for the new concatenated string.

Refer here for the strcat() implementation.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jeyaram
  • 9,158
  • 7
  • 41
  • 63
2

You don't need to define your strings in such a meticulous way. This also works:

char hello[] = "Hello";
char world[] = ", World!";

C will take care of null-terminating them for you.

Also you can do the copying in parallel, a common idiom is:

while(*destination++ = *source++)
    ;

This will first assign the char that source currently points to to destination and then increment both pointers (only the pointers, not what's inside). This is because dereference takes precedence over incrementation. Both pointers are incremented in parallel.

E.g. after the while loop has run the first time, both destination and source will point to an address in memory containing the same character.

At one point they will evaluate to \0 which the while loop evaluates as false and it will stop copying them (since the expression will no longer evaluate to true).

As this (and strcat()) are considered somewhat unsafe, make sure you have enough space in destination before doing this. Alternatively use strncat() where you can limit for how long the copying should go (if the string is not null-terminated and you let it 'rip' so to speak without limit, bad things can happen).

You can use the above like this:

void strcopycst(char* destination, char* source)
{
    while((*destination++ = *source++))
    ;
}

In your main:

char dest [25];
char source = "Hello, World!";

strcopycst(dest, source);

EDIT: As a commenter mentioned I didn't address the concatenation issue properly. Based on the code above here's a crude strcat function:

void cstmstrcat(char* dest, char* source1, char* source2) /* dest must be big enough */
{
    while((*dest++ = *source1++))
        ;

    --dest; /* backtrack the pointer as after the above 
               it will point to some random memory value */

    while((*dest++ = *source2++))
        ;
}

And here's how it's used:

int main()
{
    char source1 [] = "Hello";
    char source2 [] = ", World!";
    char dest [50];

    cstmstrcat(dest, source1, source2);

    printf("%s\n", dest);

    return 0;
}

It prints "Hello, World!".

Nobilis
  • 7,310
  • 1
  • 33
  • 67
  • The "idiom" would rather be `while(*destination != '\0') { *destination = *source; destination++; source++; }`. This is exactly the same and yields exactly the same machine code, it is just far more readable and doesn't use assignment inside the if statement, which is dangerous and poor style. – Lundin May 29 '13 at 06:23
  • 1
    Fair enough, in this case let me link a more proper explanation of this: http://stackoverflow.com/questions/810129/how-does-whiles-t-work But you can't argue that it's not common :) – Nobilis May 29 '13 at 06:27
  • btw he is asking for concatenation not copying – Ahmed Masud May 29 '13 at 07:47
  • It is common for children to eat things from the floor. Does that mean it's a good idea? – autistic May 29 '13 at 15:48
  • @Lundin No, it doesn't. That won't work correctly if the indeterminate value of the first byte from the uninitialised variable `destination` is `'\0'`. Perhaps you meant `for (*destination = *source; *destination != '\0'; source++, destination++, *destination = *source);` or something... – autistic May 29 '13 at 15:54
  • @undefinedbehaviour Oops. No, I meant `while(*source != '\0') ...`. – Lundin May 30 '13 at 06:51
2

You can solve the access out of bounds of the array by allocating enough memory...

char hello[14] = "Hello";
V-X
  • 2,979
  • 18
  • 28
0

You're attempting to store data out of bounds of that array.

char hello[] = { 'H', 'e', 'l', 'l', 'o', '\0' };

How many chars can you store in hello? Let us check.

#include <stdio.h>
int main(void) {
    char hello[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
    printf("%zu\n", sizeof hello);
}

Output: 6. That means hello[0] through to hello[5] are valid indexes. hello[6] and beyond is invalid. You'll need to declare a large enough array to store the result of the concatenation, like so:

#include <stdio.h>
#include <string.h>
int main(void) {
    char hello[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
    char world[] = { ',', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\0' };

    /* Don't forget to add 1 for NUL */
    char hello_world[strlen(hello) + strlen(world) + 1];

    strcpy(hello_world, hello);
    strcat(hello_world, world);
    puts(hello_world);
}
autistic
  • 1
  • 3
  • 35
  • 80