1

Is there a function which I can use that which will allow me to replace a specific texts.

For example: char *test = "^Hello world^"; would be replaced with char *test = "<s>Hello world</s>";

Another example: char *test2 = "This is ~my house~ bud" would be replaced with char *test2 = "This is <b>my house</b> bud"

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Jordie
  • 11
  • 1
  • 2
    The problem with strings in C is that memory management is *not* automatic. So replacing `^` with `` requires allocation of a bigger buffer, and writing the new string into the buffer. Which is to say that the function you seek is a function that you need to write. Standard functions like `malloc`, `strcpy`, `strcat`, `memcpy`, `sprintf`, and `snprintf` may be useful, but only to help you write your function. – user3386109 Sep 11 '18 at 00:38
  • 1
    Another problem is that strings like `char *test = "^Hello world^";` should not be modified. Any attempts to do so lead to undefined behaviour, and usually lead to a crash. This is different from `char test[] = "^Hello world^";` where the value of the string is stored in the array and the array is modifiable. However, to extend a string, you're going to have to indulge in memory management; you need more space for the larger string, so you have to allocate that space somehow. And, because memory management is involved, there is no standard C library function that handles the replace part. – Jonathan Leffler Sep 11 '18 at 00:42
  • There is no library function that does that for you, but it is something you can do with a few lines of code. C provides the tools to do whatever can be done on a computer -- it is up to you to put the pieces together to finish the puzzle. In the case of substitution, the easiest case is knowing how long the *final* string needs to be (+1 for the *nul-terminating* character). That way you can simply allocate *final length + 1* bytes (using `malloc`) and then fill the newly allocated buffer with the reformed text. There are many many examples on this site of just that. Let us know. – David C. Rankin Sep 11 '18 at 01:07
  • See: https://github.com/antirez/sds suggested by https://stackoverflow.com/questions/4688041/good-c-string-library – Will Bickford Sep 11 '18 at 01:46

1 Answers1

2

Before you can begin to replace substrings within a string, you have to understand what you are dealing with. In your example you want to know whether you can replace characters within a string, and you give as an example:

char *test = "^Hello world^";

By being declared and initialized as shown above test, is a string-literal created in read-only memory (on virtually all systems) and any attempt to modify characters stored in read-only memory invokes Undefined Behavior (and most likely a Segmentation Fault)

As noted in the comments, test could be declared and initialized as a character array, e.g. char test[] = "^Hello world^"; and insure that test is modifiable, but that does not address the problem where your replacement strings are longer than the substrings being replaced.

To handle the additional characters, you have two options (1) you can declare test[] to be sufficiently large to accommodate the substitutions, or (2) you can dynamically allocate storage for the replacement string, and realloc additional memory if you reach your original allocation limit.

For instance if you limit the code associated with test to a single function, you could declare test with a sufficient number of characters to handle the replacements, e.g.

#define MAXC 1024  /* define a constant for the maximum number of characters */
...
    test[MAXC] = "^Hello world^";

You would then simply need to keep track of the original string length plus the number of character added with each replacement and insure that the total never exceeds MAXC-1 (reserving space for the nul-terminating character).

However, if you decided to move the replacement code to a separate function -- you now have the problem that you cannot return a pointer to a locally declared array (because the locally declared array is declared within the function stack space -- which is destroyed (released for reuse) when the function returns) A locally declared array has automatic storage duration. See: C11 Standard - 6.2.4 Storage durations of objects

To avoid the problem of a locally declared array not surviving the function return, you can simply dynamically allocate storage for your new string which results in the new string having allocated storage duration which is good for the life of the program, or until the memory is freed by calling free(). This allows you to declare and allocate storage for a new string within a function, make your substring replacements, and then return a pointer to the new string for use back in the calling function.

For you circumstance, a simple declaration of a new string within a function and allocating twice the amount of storage as the original string is a reasonable approach to take. (you still must keep track of the number of bytes of memory you use, but you then have the ability to realloc additional memory if you should reach your original allocation limit) This process can continue and accommodate any number of strings and substitutions, up to the available memory on your system.

While there are a number of ways to approach the substitutions, simply searching the original string for each substring, and then copying the text up to the substring to the new string, then copying the replacement substring allows you to "inch-worm" from the beginning to the end of your original string making replacement substitutions as you go. The only challenge you have is keeping track of the number of characters used (so you can reallocate if necessary) and advancing your read position within the original from the beginning to the end as you go.

Your example somewhat complicates the process by needing to alternate between one of two replacement strings as you work your way down the string. This can be handled with a simple toggle flag. (a variable you alternate 0,1,0,1,...) which will then determine the proper replacement string to use where needed.

The ternary operator (e.g. test ? if_true : if_false; can help reduce the number of if (test) { if_true; } else { if_false; } blocks you have sprinkled through your code -- it's up to you. If the if (test) {} format is more readable to you -- use that, otherwise, use the ternary.

The following example takes the (1) original string, (2) the find substring, (3) the 1st replacement substring, and (4) the 2nd replacement substring as arguments to the program. It allocates for the new string within the strreplace() function, makes the substitutions requested and returns a pointer to the new string to the calling function. The code is heavily commented to help you follow along, e.g.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* replace all instances of 'find' in 's' with 'r1' and `r2`, alternating.
 * allocate memory, as required, to hold string with replacements,
 * returns allocated string with replacements on success, NULL otherwise.
 */
char *strreplace (const char *s, const char *find, 
                    const char *r1, const char *r2)
{
    const char *p = s,              /* pointer to s */
        *sp = s;                    /* 2nd substring pointer */ 
    char *newstr = NULL,            /* newsting pointer to allocate/return */
        *np = newstr;               /* pointer to newstring to fill */
    size_t newlen = 0,              /* length for newstr */
        used = 0,                   /* amount of allocated space used */
        slen = strlen (s),          /* length of s */
        findlen = strlen (find),    /* length of find string */
        r1len = strlen (r1),        /* length of replace string 1 */
        r2len = strlen (r2);        /* length of replace string 2 */
    int toggle = 0;                 /* simple 0/1 toggle flag for r1/r2 */

    if (s == NULL || *s == 0) { /* validate s not NULL or empty */
        fputs ("strreplace() error: input NULL or empty\n", stderr);
        return NULL;
    }

    newlen = slen * 2;              /* double length of s for newstr */
    newstr = calloc (1, newlen);    /* allocate twice length of s */

    if (newstr == NULL) {           /* validate ALL memory allocations */
        perror ("calloc-newstr");
        return NULL;
    }
    np = newstr;                    /* initialize newpointer to newstr */

    /* locate each substring using strstr */
    while ((sp = strstr (p, find))) {   /* find beginning of each substring */
        size_t len = sp - p;            /* length to substring */

        /* check if realloc needed? */
        if (used + len + (toggle ? r2len : r1len) + 1 > newlen) {
            void *tmp = realloc (newstr, newlen * 2);   /* realloc to temp */
            if (!tmp) {                     /* validate realloc succeeded */
                perror ("realloc-newstr");
                return NULL;
            }
            newstr = tmp;       /* assign realloc'ed block to newstr */
            newlen *= 2;        /* update newlen */
        }
        strncpy (np, p, len);   /* copy from pointer to substring */
        np += len;              /* advance newstr pointer by len */
        *np = 0;                /* nul-terminate (already done by calloc) */
        strcpy (np, toggle ? r2 : r1);  /* copy r2/r1 string to end */
        np += toggle ? r2len : r1len;   /* advance newstr pointer by r12len */
        *np = 0;                /* <ditto> */
        p += len + findlen;     /* advance p by len + findlen */
        used += len + (toggle ? r2len : r1len); /* update used characters */
        toggle = toggle ? 0 : 1;    /* toggle 0,1,0,1,... */
    }

    /* handle segment of s after last find substring */
    slen = strlen (p);          /* get remaining length */
    if (slen) {                 /* if not at end */
        if (used + slen + 1 > newlen) { /* check if realloc needed? */
            void *tmp = realloc (newstr, used + slen + 1);  /* realloc */
            if (!tmp) {         /* validate */
                perror ("realloc-newstr");
                return NULL;
            }
            newstr = tmp;       /* assign */
            newlen += slen + 1; /* update (not required here, know why? */
        }
        strcpy (np, p);         /* add final segment to string */
        *(np + slen) = 0;       /* nul-terminate */
    }

    return newstr;  /* return newstr */
}

int main (int argc, char **argv) {

    const char  *s = NULL,
                *find = NULL,
                *r1 = NULL,
                *r2 = NULL;
    char *newstr = NULL;

    if (argc < 5) { /* validate required no. or arguments given */
        fprintf (stderr, "error: insufficient arguments,\n"
                        "usage: %s <find> <rep1> <rep2>\n", argv[0]);
        return 1;
    }
    s = argv[1];        /* assign arguments to poitners */
    find = argv[2];
    r1 = argv[3];
    r2 = argv[4];

    newstr = strreplace (s, find, r1, r2);  /* replace substrings in s */

    if (newstr) {   /* validate return */
        printf ("oldstr: %s\nnewstr: %s\n", s, newstr);
        free (newstr);  /* don't forget to free what you allocate */
    }
    else {  /* handle error */
        fputs ("strreplace() returned NULL\n", stderr);
        return 1;
    }

    return 0;
}

(above, the strreplace function uses pointers to walk ("inch-worm") down the original string making replacement, but you can use string indexes and index variables if that makes more sense to you)

(also note the use of calloc for the original allocation. calloc allocates and sets the new memory to all zero which can aid in insuring you don't forget to nul-terminate your string, but note any memory added by realloc will not be zeroed -- unless you manually zero it with memset or the like. The code above manually terminates the new string after each copy, so you can use either malloc or calloc for the allocation)

Example Use/Output

First example:

$ ./bin/str_substr_replace2 "^Hello world^" "^" "<s>" "</s>"
oldstr: ^Hello world^
newstr: <s>Hello world</s>

Second example:

$ ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/str_substr_replace2 "This is ~my house~ bud" "~" "<b>" "</b>"
==8962== Memcheck, a memory error detector
==8962== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8962== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==8962== Command: ./bin/str_substr_replace2 This\ is\ ~my\ house~\ bud ~ \<b\> \</b\>
==8962==
oldstr: This is ~my house~ bud
newstr: This is <b>my house</b> bud
==8962==
==8962== HEAP SUMMARY:
==8962==     in use at exit: 0 bytes in 0 blocks
==8962==   total heap usage: 1 allocs, 1 frees, 44 bytes allocated
==8962==
==8962== All heap blocks were freed -- no leaks are possible
==8962==
==8962== For counts of detected and suppressed errors, rerun with: -v
==8962== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have any further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85