2

Based on suggestions in this question here, I've written the following program that finds, and replaces, a target substring in a string. (It could be run in a loop to find and replace all instances of "target" using the strstr() string method, but I just give an example here.)

#include <stdio.h>
#include <string.h>

#define SUCCESS 1
#define FAILURE 0
#define STR_CAP 80 + 1

int stringify(char *string, const char *target, const char *replacement) {
    char segment[STR_CAP];
    int S = strlen(string), T = strlen(target);
    char pre_segment[STR_CAP];
    char post_segment[STR_CAP];

    for (int i = 0; i < S; i++) {
        strncpy(segment, string + i, T);
        segment[T] = '\0';
        if (strcmp(segment, target) == 0) {
            memcpy(pre_segment, string, i);
            /*printf("pre_segment: %s\n", pre_segment);*/
            pre_segment[i] = '\0';
            memcpy(post_segment, string + i + T, S - (i + T));
            post_segment[S - (i + T)] = '\0'; /*WHAT IS THIS MAGIC?*/
            /*printf("post_segment: %s\n", post_segment);*/     
            strcat(pre_segment, replacement);
            /*printf("pre_segment after concat.: %s\n", pre_segment);*/
            strcat(pre_segment, post_segment);
            /*printf("pre_segment after concat,: %s\n", pre_segment);*/
            strcpy(string, pre_segment);
            return SUCCESS;
        }   
    } return FAILURE;
}

int main() {

    char string[] = "The quick brown fox jumped over the lazy, brown dog.";
    char target[] = "brown";
    char replacement[] = "ochre-ish";

    stringify(string, target, replacement);
    printf("%s\n", string);
    stringify(string, target, replacement);
    printf("%s\n", string);

    return 0;

}

But there are some things I don't understand.

(1) For one thing: before I used memcpy() and manually set the ends of the strings I copied to '\0', I kept getting buffer-overflows (in the form of stack-smashing), and weird errors. What effect does this have that just naively using strncpy() doesn't? For example, if you comment out the "MAGIC" line, the program just about implodes.

(2) Is there a nicer or more canonical way of doing this? This feels very clunky, but I couldn't find any basic "how-tos" on substring replacement.

Chris
  • 215
  • 1
  • 12
  • To be clear - you want to replace a 'target' sub-string with a 'replacement' sub-string that is longer than the target, in-place, without allocating a new string for the output? – ThingyWotsit Jun 22 '17 at 00:55
  • @ThingyWotsit In this instance, yes - the background to this question is a program where invalid situations like that get caught and handled external to the operation of this function. – Chris Jun 22 '17 at 00:56
  • Well, since it's undefined behaviour, I don't think that is possible. – ThingyWotsit Jun 22 '17 at 00:58
  • @ThingWotsit Not sure what you mean. If "replacement' is too long to be put into "string" without overflowing the 80 non-null characters allowed, the other part of this program will check for that, see that, and refuse to call "stringify". – Chris Jun 22 '17 at 00:59
  • 1
    @Chris You again forgot to append the copied string with a zero character at least in thisstatement strncpy(segment, string + i, T); as I already pointed in your previous question.:) – Vlad from Moscow Jun 22 '17 at 00:59
  • @VladfromMoscow oh no. segment[T] = '\0', right? – Chris Jun 22 '17 at 01:01
  • @Chris It is better now.:) – Vlad from Moscow Jun 22 '17 at 01:01
  • @VladfromMoscow So thank you once again for pointing that out! I'm still not sure I 100% get what situations you need to manually add '\0' in, but it seems like copying strings is one of them? – Chris Jun 22 '17 at 01:02
  • 2
    @Chris When the function strncpy is used instead of strcpy then it can copy a substring of the source string without the terminating zero. In this case the result array will not contain a string. – Vlad from Moscow Jun 22 '17 at 01:05
  • @VladfromMoscow ahhh. Ok, that makes sense. – Chris Jun 22 '17 at 01:05
  • 1
    **Please stop using `strncpy`!** It is never the right tool in the hands of a beginner and still too error prone for more experienced C programmers. Learn about this here: https://randomascii.wordpress.com/2013/04/03/stop-using-strncpy-already/ – chqrlie Jun 22 '17 at 01:09

3 Answers3

2
  1. memcpy() only copies memory and has no recollection of strings and null terminators. strncpy() will copy the first n characters and will fill the rest with 0s (null terminator) only if the null terminator was found in the n characters, so chances were you gave strncpy() the length of the strings you wanted to copy, which it did, but did not find a null terminator. In this sense, strncpy() acted just like memcpy(). Either way, you needed to manually assign the null terminator. The memory is being corrupted simply because the program reads a string, but does not find a null terminator, and therefore will continue to read, until it goes past the stack, the heap, and the rest of the memory until it reaches a null terminator (not in that order).

  2. There are a lot of ways to do substring replacement, here is another way to do it that is slightly more elegant: What is the function to replace string in C?

EDIT in response to: "Can you explain more about what memcpy() does and why people are so unhappy with strncpy()?"

Imagine your memory looks like this:

0              3    
01 | 30 | 40 | 00 | 32 | 40 

I then execute memcpy(0x03, 0x00, 3); telling the computer to copy 3 bytes from address 0 to address 3. Now my memory looks like this:

0              3    
01 | 30 | 40 | 01 | 30 | 40 

The computer quite literally copies over as I instructed, byte by byte. Lets take a look at what strncpy() would do in this scenario. Starting from the first diagram above, calling strncpy(0x03, 0x00, 3); telling the computer to copy the null-terminated char array (string) at 0x00, to 0x03, but maxing out at 3 characters. The result is:

0              3    
01 | 30 | 40 | 01 | 30 | 40 

Lets imagine instead that we started out with:

0              3    
65 | 00 | 40 | 01 | 30 | 40 

Here, address 0x00 actually refers to a 1 character long string: "A". Now, if we call the same strncpy(0x03, 0x00, 3) the computer first copies over the 65, notices a null terminator, and stops there. Then it fills the remaining parts with 0, with this result:

0              3    
65 | 00 | 40 | 65 | 00 | 00

You can see that if your string is longer than 3 characters, the strncpy would not copy over a null terminator for you, since it will "max out". This is the same as memcpy. strncpy will only move a null terminator over to the destination if it encounters a null terminator in the first n bytes. Therefore, you can see in the first few examples, even though address 0x00 contained a 3 character valid string (terminating with 0 at address 0x03), the strncpy() function did not produce a valid string at the destination (0x03), since 0x04 could be anything (not necessarily a null terminator).

Felix Guo
  • 2,700
  • 14
  • 20
  • So in this instance there wasn't a meaningful difference between strncpy() and memcpy(). Can you explain more about what memcpy() does and why people are so unhappy with strncpy()? – Chris Jun 22 '17 at 00:55
  • `strncpy`: just say no: https://randomascii.wordpress.com/2013/04/03/stop-using-strncpy-already/ – chqrlie Jun 22 '17 at 01:10
  • 1
    @Chris-- because `strcpy()` always copies the `\0` (if there is room, and UB if not), the result is always a string. But, as you have found, `strncpy()` does _not_ always copy the `\0`, and does not add one; yet people often mistakenly believe that it is like other `strX()` functions that result in strings. This source of confusion is removed when using `memcpy()` and `memmove()`, since there is no false expectation of automagically generating strings. – ad absurdum Jun 22 '17 at 01:42
  • @Chris-- I should have added that `strncpy()` _will_ add `\0` characters if `n` is larger than the size of the source string. The complexity in the behavior of `strncpy()` makes the simplicity of using `memcpy()` and `memmove()` appealing. – ad absurdum Jun 22 '17 at 02:07
1

memcpy() just copies a number of bytes from an array to another one. It is a very simple function with 2 caveats:

  • it should not be used when the source and destination arrays overlap.
  • it is meant to copy bytes, so the size must be computed as a number of bytes, which is error prone when copying between arrays of largers types. In this case, the char type is by definition exactly one byte, so the size is just the number of characters, plus 1 for the null terminator if needed.

memmove() does the same as memcpy, but it can be used for overlapping objects. It needs an extra test to handle all cases correctly.

Using these functions, here is a much simpler version of your stringify function:

#include <stdio.h>
#include <string.h>

int stringify(char *dest, const char *target, const char *replace) {
    char *p = strstr(dest, target);
    if (p == NULL) {
        /* no replacement */
        return 0;
    }
    size_t len1 = strlen(target);
    size_t len2 = strlen(replace);
    if (len1 != len2) {
        /* move the remainder of the string into the right place */
        memmove(p + len2, p + len1, strlen(p + len1) + 1);
    }
    memcpy(p, replace, len2);
    return 1;
}

int main(void) {
    char string[80] = "The quick brown fox jumped over the lazy, brown dog.";
    char target[] = "brown";
    char replacement[] = "ochre-ish";

    stringify(string, target, replacement);
    printf("%s\n", string);
    stringify(string, target, replacement);
    printf("%s\n", string);

    return 0;
}

Notes:

  • this function does not check if the destination array has enough space when the replacement is longer than the target string.
  • In your code, the destination string is not long enough, producing a buffer overflow causing undefined behavior.
  • memmove must be used instead of memcpy to shift the remainder of the string because the source and destination arrays overlap.
  • calling stringify iteratively may not produce the expected replacements if the target and replacement strings have a common part: replacing brown with brownish will obviously fail for multiple matches.

Replacing all matches in a loop, and allocating the resulting string is not difficult:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *replace_all(const char *str, const char *target, const char *replace) {
    size_t len1 = strlen(target);
    size_t len2 = strlen(replace);
    size_t matches = 0;
    const char *p, *p0;
    char *dest, *q;

    if (len1 > 0) {
        for (p = str; p = strstr(p, target) != NULL; p += len1)
            matches++;
    }
    if (matches == 0) {
        return strdup(str);
    }
    dest = malloc(strlen(str) - len1 * matches + len2 * matches + 1);
    if (dest != NULL) {
        for (p = str, q = dest; (p = strstr(p0 = p, target)) != NULL; p += len1) {
            memcpy(q, p0, p - p0);
            q += p - p0;
            memcpy(q, replace, len2);
            q += len2;
        }
        memcpy(q, p0, strlen(p0) + 1);
    }
    return dest;       
}

int main(void) {
    char string[] = "The quick brown fox jumped over the lazy, brown dog.";
    char target[] = "brown";
    char replacement[] = "brownish";

    char *p = replace_all(string, target, replacement);
    printf("%s\n", p);
    free(p);

    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
0

There is a big problem with your approach: replacement can be long enough that the result string would be longer than 80 characters long.

I'd do it like this

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *stringify(const char *src, const char *target, const char *replacement)
{
    int buffer_len = 0;
    char *buffer = NULL, *tmp;
    if(src == NULL || replacement == NULL || target == NULL)
        return NULL;

    int repl_len = strlen(replacement);
    int target_len = strlen(target);

    while(tmp = strstr(src, target) && target_len)
    {
        // get length of string until target
        int substr_len = tmp - src;

        // get more space
        char *new_buff = realloc(buffer, buffer_len + substr_len + repl_len + 1);

        if(new_buff == NULL)
        {
            // out of memory
            return buffer;
        }

        if(buffer == NULL) // first realloc
            *new_buff = 0;

        buffer = new_buff;

        // copy that what is before the found target
        strncat(buffer, src, substr_len);

        buffer[buffer_len + substr_len] = 0;

        //// copy the replacement at the end of buffer
        strncat(buffer, replacement, repl_len);

        buffer[buffer_len + substr_len + repl_len] = 0;

        buffer_len += substr_len + repl_len;

        //// set src to the next character after the replacement
        src = tmp + target_len;
    }

    if(buffer == NULL) // target never found
    {
        buffer = malloc(strlen(src) + 1);
        if(buffer == NULL)
            return NULL;

        strcpy(buffer, src);
    } else {
        if(*src) {
            // copy the leftover
            int leftover_len = strlen(src);

            char *new_buff = realloc(buffer, buffer_len + leftover_len + 1);

            if(new_buff == NULL)
                return buffer; // could not expand more, return all we've converted

            buffer = new_buff;

            strcat(buffer, src);
        }
    }

    return buffer;
}


int main(void)
{
    char string[] = "The quick brown fox jumped over the lazy, brown dog.";
    char target[] = "brown";
    char replacement[] = "ochre-ish";

    char *new_txt = stringify(string, target, replacement);

    if(new_txt)
    {
        printf("new string: %s\n", new_txt);
        free(new_txt);
    } else
        printf("Out of memory\n");

    return 0;
}

Of course you could improve the code but my code should show how to use realloc and pointer arithmetic which is very important when you start manipulating string or array of data.

Pablo
  • 13,271
  • 4
  • 39
  • 59
  • 1
    you call `strncat` with an uninitialized destination array `buffer`. Why not use `memcpy`? – chqrlie Jun 22 '17 at 01:36
  • 1
    you should also test that `target` is not empty, otherwise the `while` loop will run forever. – chqrlie Jun 22 '17 at 01:38
  • 1
    @chqrlie thanks, I forgot that only `calloc` initializes the memory with 0. With target being blank, I didn't thought about that. I've updated the code. – Pablo Jun 22 '17 at 01:44