Search and replace c-style strings in c++

Question

I am trying to write a function that takes three c-style strings, and returns a c-style string. This function searches a c-string for all occurrences of the sub-string and replaces them with a different string.
This program works but seems very inelegant. I can't help the feeling like it could have been done in a less bulky way.

char* replaceSubstring(char *original, char *from, char *to)
{
     int origlen = strlen(original);
     int i = 0;
     int count = 0;
     char *ptr;

     //figure out how many times the sub-string occurs in a string.
     //i couldn't figure out a way to avoid this loop
     while (i<origlen)
     {
           ptr = strstr(original+i, from);
           if (!ptr)
               break;
           else
           {
               i = ptr - original + 1;
               count++;
           }
     }
     //figure out what the size of the output string has to be
     int newsize = origlen + (strlen(to) - strlen(from)) * count;

     char *newstring = new char[newsize];  
     newstring[0] = '\0';  
     i = 0;
     while (i < origlen)
     {
          ptr = strstr(original+i, from);
          if (!ptr)
          {
               strcat(newstring,original+i);
               break;
          }
          else
          {
               //this looks extremely ugly and bulky...
               strncat(newstring, original+i, ptr-(original+i));
               strcat(newstring, to);
               i = i + ptr - (original + i) + strlen(from);
          }
     }
     strcat(newstring,"\0");
     return newstring;
}

Would anyone have any suggestions on how to make this code clearer and/or more efficient ? Any comments are welcome. Please do not suggest to use class string instead. That is not an option. The function must work with c-strings

Using `strcat` so many times is probably problematic for performance because, for every call, you have to iterate the entire string. Also, `strcat(newstring, "\0")` effectively does nothing because `newstring` must be NULL-terminated to be used with `strcat` anyway. — Seth Carnegie, Apr 24 '12 at 23:07
You cannot use the string class? Such a requirement makes me wonder if this is homework... is it? — fbrereto, Apr 24 '12 at 23:27
No this is not homework, I just realized that I haven't worked with c-strings in a while, and I had to get a refresher on how they work. So I created a few exercises for myself just to see that I can do same basic things that I can do with std::string — MadOgre, Apr 24 '12 at 23:46
i am trying to get maximally effective code using only c facilities (except cout and new of course :) ) — MadOgre, Apr 24 '12 at 23:48
I put the `c` tag back after the OP's reasonable explanation. — ildjarn, Apr 25 '12 at 00:00
The overall verbosity and simplicity of the function is reasonable - it's basically as messy as the problem. Seth's comments are entirely valid (note that strcat will keep the output terminated - there's no need to `strcat(newstring, "\0")` anyway, and that you can avoid the degrading performance Seth warns of by calling `str[n]cat(newstring + characters_so_far, ...`). Additionally, when you `new` you must allocate an extra byte for the NUL terminator, and consider calling `strlen(from)` once outside the `while` loop and saving the value. Your function parameters should be `const`. — Tony Delroy, Apr 25 '12 at 01:27
Perhaps this sounds harsh, but, while I respect your desire to re-learn pointers, I believe [tutorials](https://www.google.com/#hl=en&sclient=psy-ab&q=c-style+string+tutorial&oq=c-style+string+tutorial) are a better place for this than SO. Also, there is this [possible duplicate](http://stackoverflow.com/q/779875/777186). — jogojapan, Apr 25 '12 at 01:41

Seth Carnegie · Answer 1 · 2012-04-25T18:20:42.150

One improvement I would make that would probably improve elegance and efficiency simultaneously would be to

Allocate an array of integers that will hold the indices of the substrings that match the given string.
Loop through the string and find all the matching substrings, and add each to the array, reallocating the array larger as needed (because you don't want to use the STL I presume; if you can, use ~~std::vector~~ ~~std::list~~ std::deque).
Allocate new memory for the modified string based on the length of the original string and how many substrings you found.
Iterate the old string and the array simultaneously, copying the non-matched parts from the old string to the new.
Fill in the holes you left with the replacement string.

Also, instead of allocating memory dynamically inside the function, I would change it to accept a caller-allocated buffer and maximum buffer size instead. This way the caller can be completely responsible for the lifetime of the memory (utilising automatic memory if they want/can) and you don't have to worry about calculating a buffer size (you rely on the caller for that).

EDIT:

Here is an example implementation I whipped up. Please let me know if anyone finds any errors, which is likely. (You might not want to read this if you want to figure it out yourself.)

char* strreplace(const char* haystack, const char* needle, const char* replacement) {
    // using deque for pop_front
    std::deque<const char*> positions;
    unsigned int haystacklen    = strlen(haystack),
                 needlelen      = strlen(needle),
                 replacementlen = strlen(replacement);

    for (const char* cur = haystack, *pos = strstr(cur, needle); pos; cur = pos + 1, pos = strstr(cur, needle))
        positions.push_back(pos);

    char* newstr    = new char[haystacklen + replacementlen * positions.size() + 1],
          dst       = newstr;
    const char* src = haystack;

    while (src <= haystack + haystacklen)
        if (!positions.empty() && src == positions.front()) {
            strcpy(dst, replacement);
            dst += replacementlen;
            src += needlelen;
            positions.pop_front();
        } else
            *dst++ = *src++;

    return newstr;
}

And don't forget to delete[] the return value of that function.

I went for efficiency without doing maximum optimisations. For instance, you could have a while loop that executed while positions.empty() was false, and then when it becomes true, just exit the loop and do a straight strcpy for the rest because there are no more replacements to be made, which would let you avoid unnecessarily calling positions.empty() for every character even if there are no replacements to be made left, or at all. But I think that is a small nit, and the code conveys the point.

Also, I used ~~std::list~~ std::deque to remove all the array management code but that should be straighforward if you want to do it yourself.

As ildjarn mentioned in the comments, I changed from list to deque because I use the size member and, per his comment, it's not O(1) (usually it would be O(n)) on all pre-C++11 implementations, so deque with it's constant-time size will be more efficient.

`std::list<>::size()` is not O(1) on all pre-C++11 implementations; I would recommend using `std::deque<>` instead of `std::list<>`. — ildjarn, Apr 25 '12 at 18:07
@ildjarn Thanks, I updated the code and added a comment to that effect. — Seth Carnegie, Apr 25 '12 at 18:14

score 0 · Answer 2 · answered Apr 25 '12 at 01:43

You can get rid of the first part of your code to calculate the count if you simply set the size of the newstring to be the maximum possible size after the solution.

In particular:

int newsize = origlen + (strlen(to) - strlen(from)) * origlen/strlen(from);

Also, instead of calling strlen(from) multiple times, just assign it to a variable (e.g. srtlen_from) and just use that.

Jesse Good · Answer 3 · 2012-04-25T02:39:57.637

Here is a version I made which is pretty much using pointers only (error checking, etc. is omitted) (I have also noticed that it fails in certain cases):

#include <cstring>
#include <cstdlib>
#include <iostream>

char* replaceSubstring(char *original, char *from, char *to)
{
// This could be improved (I was lazy and made an array twice the size)
    char* retstring = new char[std::strlen(original) * 2];

    int pos = 0;
    for (int i = 0; *(original + i); ++i)
    {   
        if (*(original + i) == *(from)) 
        {
            // Got a match now check if the two are the same
            bool same = true; // Assume they are the same
            for (int j = 1, k = i + 1; *(from + j) && *(original + k); ++j, ++k)
            {
                if (*(from + j) != *(original + k))
                {
                    same = false;
                    break;
                }
            }
            if (same)
            {
                // They are the same now copy to new array
                for (int j = 0; *(to + j); ++j)
                {
                    retstring[pos++] = *(to + j);
                }
                i += std::strlen(from) - 1;
                continue;
            }
        }
        retstring[pos++] = *(original + i);
    }
    retstring[pos] = '\0';
    return retstring;
}

int main()
{
    char orig1[] = "Replace all the places that say all";
    char* r1 = replaceSubstring(orig1, "all", "Replacement");
    std::cout << r1 << std::endl;
    delete [] r1;

    char orig2[] = "XXXXXX with something else XXXXXX";
    char* r2 = replaceSubstring(orig2, "XXXXXX", "hello");
    std::cout << r2 << std::endl;
    delete [] r2;
}

std''OrgnlDave · Answer 4 · 2012-04-25T20:53:24.053

0

Self-unexplanatory: http://ideone.com/ew5pL

This is what ugly and bulky looks like - no C functions except an strlen and a memcpy at the end.

I think yours looks nice and compact.

edited Apr 25 '12 at 20:53

answered Apr 25 '12 at 20:41

std''OrgnlDave

3,912
1
25
34

Search and replace c-style strings in c++

4 Answers4