0

I have some basic experience with C, but it usually takes me a little while to figure out how to implement something; using pointers and such is still a bit of a mystery to me.

Then I see an example like strcat implementation and I can't follow along. Would someone mind kindly explaining this to a C newcomer?

char *
my_strcat(char *dest, const char *src)
{
    char *rdest = dest;

    while (*dest)
      dest++;
    while (*dest++ = *src++)
      ;
    return rdest;
}

When I read that, I think "rdest = ?, maybe real destination". So set a pointer to the original destination. Then "while (*dest) dest++;", what is that doing? Same with the next line. I don't follow.

Is it using any additional memory to the original two parts (src and dest)? Like in JS, if you concatenate 2 strings, it creates memory for a third string that combines the two, so you have double the memory. How is this avoided in this C implementation (if it is)?

Lance
  • 75,200
  • 93
  • 289
  • 503

6 Answers6

2
char * my_strcat(char *dest, const char *src)
{
    // Standard dictates strcat() to return dest.
    // That is pretty useless (returning a pointer to the
    // *end* of dest would have been better), but that's
    // the way it is.
    // Since we iterate dest as part of the implementation,
    // we need to "remember" its original value.
    char *rdest = dest;

    // Iterate over the characters pointed to by dest until
    // we found the end (null byte terminator), which is "false"    
    while (*dest)
      dest++;

    // An assignment evaluates to the value assigned. So assigning
    // one character at a time (*dest = *src) will eventually
    // evaluate to false when we assigned the null byte terminator
    // from src (incidentially also terminating dest). Since we
    // postfix-increment both pointers during the assignment, we
    // don't need any actual body for the loop.
    while (*dest++ = *src++)
      ;

    // Return the "remembered" original dest value.
    return rdest;
}

Is it using any additional memory to the original two parts (src and dest)? Like in JS, if you concatenate 2 strings, it creates memory for a third string that combines the two, so you have double the memory. How is this avoided in this C implementation (if it is)?

A precondition for strcat is that dest must have enough space to hold the end result. So, no, it does not need / assign additional memory. It is up to you to make sure there is enough memory, or realloc more memory before you call strcat.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • Can you explain `while (*dest)` in more depth, same with `while (*dest++ = *src++)`? How does the pointer-ness work in this situation? What is `(*dest)` getting, why not `while (dest)`? etc. – Lance Jun 01 '20 at 12:47
  • @LancePollard You need to dereference the pointers to copy the character values itself. `while (dest)` would always evaluate to `true` if `dest` isn´t a null pointer. When you use `dest` you use the value of the pointer (actually an address), not the value inside of the referenced object. – RobertS supports Monica Cellio Jun 01 '20 at 12:48
  • @LancePollard: `dest` is the address pointed to, `*dest` is the *value*, i.e. the character in this case, at the address pointed to. A C string is terminated by a zero byte (`'\0'`). Zero is false, anything not zero is true. So `while (*dest)` will loop until the terminating zero byte is found. We are checking the *character*, not the address. – DevSolar Jun 01 '20 at 12:50
  • @LancePollard: Comments extended. – DevSolar Jun 01 '20 at 12:52
  • Just as an aside, this code clearly demonstrates why one should be careful using this function (and others) that copy data from a source address to a destination address. No size is passed in, so the function assumes according to C standard 2 things: 1- `src` ends in a zero byte, 2-`dest` has enough space to hold the copied data. If any of those conditions do not hold true, then we have 'undefined behavior' – DNT Jun 01 '20 at 12:53
  • @DNT: That's called "preconditions". For example, if `src` does not end in a zero byte, it's not a string, and `strcat` assumes that both `src` and `dest` *do* point to strings. It is (by necessity) a basic assumption of C that the programmer knows what he is doing, has checked the preconditions, and does any necessary bookkeeping himself. There are other functions like `strncpy`, `strcpy_s`, or `strncpy_s` that do more checking. – DevSolar Jun 01 '20 at 12:57
  • @DevSolar Yes, they are preconditions, but they are not enforced in any way within the functions for many good reasons, except by versions implemented in some debug libraries. Since a 'basic knowledge of C' was mentioned, I decided to add a little side-comment to point them out. – DNT Jun 01 '20 at 13:07
  • @DNT: That was what I wanted to point out as well. As a general rule, C doesn't "check" or "enforce" preconditions. Most of the time, you *cannot* check the preconditions (from the standpoint of a library implementor). That's just the way this language rolls, and part of why it works so well in embedded applications where every byte and clock cycle counts. – DevSolar Jun 01 '20 at 20:56
1

const char *src

src shouldn't be modified by the function, hence use const correctness to mark it as read-only.

char *rdest = dest;

Save the original position until later, since there's a requirement that strcat should return a pointer to the first element of the merged string (return rdest;).

while (*dest)
dest++;

The while loop is implicitly looking for the null terminator. Meaning: find the end of the first string, so that after this loop, dest points at the null terminator of that string.

while (*dest++ = *src++)

This is a common, although admittedly confusing idiom in C. (It actually implements strcpy in this line.) Operator precedence says postfix ++ takes precedence over prefix * over assignment =.

So first each pointer is evaluated and ++ is applied to the pointers, not the pointed-at data. But since it is postfix, the actual increment of the pointer address does not happen until the end of the expression.

* takes the contents of each pointer before this increment, and then = copies the content from *src to *dest. Again, this happens before the addresses are incremented.

Finally, there is an implicit check against null termination, since the result of the = operand can actually be checked - it is equivalent to its left operand, in this case *dest. And note that the null terminator gets copied, too.

You could rewrite this while loop in a less confusing way:

*dst = *src;
while(*src != '\0')
{
  dst++;
  src++;
  *dst = *src;
}
Lundin
  • 195,001
  • 40
  • 254
  • 396
1

The crucial thing to understand in this code is the way C handles strings (an array of characters terminated by '\0'). The first thing to do is ditch the analogy to a string as a word, and think of it in a value-by-value basis.

The dest argument of the function represents the pointer to the first character of the destination string. To add more characters after the dest string, we need to get to its '\0' terminator, because that's where the second string will land. That's the purpose of this loop:

while (*dest)
      dest++;

((*dest) condition is equivalent to (*dest != '\0'), because '\0' has a numerical value of 0, which is equivalent to false)

After we've gotten to the position where the second string needs to begin, we start copying it character by character:

while (*dest++ = *src++)
      ;

Note that (*dest++ = *src++) has a single '=' character, meaning it is an assignment, not a comparison. The value being tested inside the parentheses is the thing getting assigned, i.e. *src. So, it will continue as long as (*src != '\0'), which happens to be where the second string ends. Also note that the '\0' character IS ALSO COPIED in these assignments, which is an absolute must, because without it the resulting string wouldn't be terminated (so, technically speaking it wouldn't even be a valid string).

Great, now that we've copied the string where it needs to be, we need to return the pointer to the first character. Ah, but we've moved the pointer in the first loop! That's where rdest comes in, saving the initial position before the loops so that we can return it at the end.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Randoragon
  • 98
  • 1
  • 5
1

Let's start with the function declaration:

 char * my_strcat(char *dest, const char *src)

This function will return a pointer to char, its arguments are also pointers to char, they will point to the beginning of each one of the char arrays that are passed as arguments. Since the src is not to be altered, it can be passed as const.

This assignment:

char *rdest = dest;

Declaring a pointer and making it point to the beginning of array passed through dest pointer.

The cycle:

while (*dest)
  dest++;

As you might know any string in C is null terminated with '\0', turns out that the ASCII value of that null terminator is 0, so you can use it as a stop condition.

So essentially this pointer, which is pointing to the beginning of dest, is being incremented until it finds the end of the string.

The cycle:

while (*dest++ = *src++)
  ;

Now that the dest pointer is pointing to the end of the dest string, it's just incrementing both pointers, and appending every character in the src string starting from first character in src and the end of the dest string. When a \0 is added, again this will be the stop condition, the expression will evaluate to 0, false, and the string will have the null terminator.

The return:

return rdest;

This pointer remained unaltered through the function and is pointing to the beginning of the dest string, which now has the appended src also. That's what we want to return.

anastaciu
  • 23,467
  • 7
  • 28
  • 53
0

If you change a bit the role of the rdest will be more clearer

char * my_strcat(char *dest, const char *src)
{
    char *workdest = dest;

    while (*workdest) workdest++;
    while (*workdest++ = *src++);
    return dest;
}

Now we use a work pointer just to iterate and return the original dest

Is it using any additional memory to the original two parts (src and dest)? Like in JS, if you concatenate 2 strings, it creates memory for a third string that combines the two, so you have double the memory. How is this avoided in this C implementation (if it is)?

This version (and the standard library strcat as well) does not allocate any memory and the caller has to make sure that the dest is writeable and and large enough to accommodate the connected strings

You need to write another version the function:

char * my_strcat_s(char *dest, const char *src)
{
    size_t destlen = strlen(dest);
    char *workdest = malloc(destlen + strlen(src) + 1);

    if(workdest)
    {
        strcpy(workdest, dest);
        strcpy(workdest + destlen, src);
    }
    return workdest;
}

But freeing the allocated memory is the programmers responsibility

0___________
  • 60,014
  • 4
  • 34
  • 74
0

string is just an array (buffer) of chars. Basically, an array of 8-bit unsigned ints. And the last element in the array is '\0'. The actual array can be much bigger than the string occupying it, and strcat indeed requires that the dest is big enough to contain both the dest string and the source strings together. strcat is not a ready-to-use method like in higher level languages. It's use case looks like this:

  1. char* buffer = malloc(strlen(string1) + strlen(string2) +1) Create a buffer that's big enough to contain both strings.
  2. strpy(buffer, string1) Copy the first string to the buffer
  3. strcat(buffer, string2) Append the second string to the buffer where the first string ends.

++ and -- operators allow a pointer to serve as enumerator. Think of those as .next() and .prev(). The caveat here is that they return (or accept) the value BEFORE moving the enumerator. This is critical here, this basically is what allows C to be so hard ;) If you want to recreate this in higher level, it'll be getAndNext() and setAndNext()

* is an accessor, working both ways, so it's enumerator's getValue() and setValue().

First block just skips the dest buffer until it reaches the end of the string in it - but NOT the end of the buffer.

while (*dest)
    dest.next();

in pseudo-code is:

while (dest.get() != '\0')
    dest.next();

That's because \0 is a real zero in the int meaning, and int zero is false in boolean meaning. Anything non-zero is true. That means -1, 42 and 'A' are just as true as 1. So in C we just skip the != 0 which is as pointless as writing != false in a language that has real booleans.

while (*dest++ = *src++)
  ;

Can be restated as:

while (dest.setAndNext(src.getAndNext()) != '\0')

or without the compounding:

char value;
do
{
    dest.set(src.get());
    value = src.get();
    src.next();
    dst.next();
}
while (value != '\0');

That's because in C an assignment has a value. So (*dest++ = *src++) ultimately returns the character that had been copied. It's like an inline function that copies, advances and then returns what was copied.

A pointer can legally point beyond the array. It's like enumerator that had reached the end and there is nothing more. The big difference is that a high-level enumerator can and will tell you that (via an exception), while a pointer will keep going even though it doesn't make any sense anymore. That's why both src and dest pointers are ++ed one time too many, but we don't care because we've took care to never use them after that.

rdest is simply saved position of where the buffer started. We can't return dest, because that enumerator had been used up and now it's at the end of the string, while we need to return the beginning. "r" probably stands for "return", because the whole point of this variable is to be returned.

Agent_L
  • 4,960
  • 28
  • 30