-1

The goal is to replace multiple (or all) occurences of a given text in another string using only C strings.

(self answered question)

bleudoutremer
  • 37
  • 1
  • 6
  • Possibly related: https://stackoverflow.com/questions/779875/what-function-is-to-replace-a-substring-from-a-string-in-c, https://stackoverflow.com/questions/39953972/c-replace-substring-in-string, – Basya Jul 06 '21 at 10:53
  • What do you mean by "match"? A regular expression? Or something else? Do you need to expand back-references in the replacement? This question is unclear and far too broad. – Toby Speight Jul 07 '21 at 07:21
  • By 'match' I only meant "an occurence of a given text that was found within another text". Edited it, and hopefully made it clearer. I intended to share code that was useful to me. – bleudoutremer Jul 07 '21 at 11:51

2 Answers2

0

This uses fixed size buffers, you must make sure they are big enough to hold the string after replacement is done.

Define the size before use:

#define LINE_LEN 256

This code was tested with MSVC 2019.

void replaceN(char* line,const char* orig,const char* new, int times){
    char* buf;
    if(times==0) return; //sem tempo irmao
    
    if((times==-1||--times>0) && (buf = strstr(line,orig))!=NULL){ //find orig
        for(const char *c=orig;*c;c++) buf++; //advance buf
        replaceN(buf,orig,new,times); //repeat until the last occurrence
    }
    //this will run first for the last match
    if((buf = strstr(line,orig))!=NULL){ 
        char tmp[LINE_LEN];
        int i = buf-line; //pointer difference
        strncpy(tmp,line,i); //copy everything before the match
        for(const char *k=orig;*k;k++) buf++; //buf++; //skip find string
        for(const char *k=new;*k;k++) tmp[i++]=*k; //copy replace chars
        for(;*buf;buf++) tmp[i++]=*buf; //copy the rest of the string

        tmp[i]='\0';
        strcpy(line,tmp);       
    }
}
inline void replace(char* line,const char* orig,const char* new){replaceN(line, orig, new, 1);}
inline void replaceAll(char* line,const char* orig,const char* new){replaceN(line,orig,new,-1);}
bleudoutremer
  • 37
  • 1
  • 6
0

Turns out I had too much self esteem. The code was not tested, and I should not have posted it without proper testing. I add this comment to remind others of not doing the same mistake. If you find any other errors, please let me know.

In order to keep it simple, I don't do it in place. Instead it requires a preallocated output buffer. Doing in place is risky if the size of the new string is longer than the original. And there's also an edge case that can be tricky to handle, and that's when the original substring to replace is a substring of the new string.

The headers needed to run allt his:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>

The main replace function. It replaces maximum n occurrences and returns number of replacements. dest is a buffer big enough to hold the result. All pointers needs to be non NULL and valid. You may notice that I'm using goto which may be frowned upon, but using it to exit cleanly is very convenient.

size_t replace(char *dest, const char *src, const char *orig, 
                const char *new, size_t n) {
    size_t ret = 0;

    // Maybe an unnecessary optimization to avoid multiple calls in 
    // loop, but it also adds clarity
    const size_t newlen = strlen(new);
    const size_t origlen = strlen(orig);

    if(origlen == 0 || n == 0) goto END; // Edge cases

    do {
        const char *match = strstr(src, orig);

        if(!match) goto END;
        
        // Length of the part of src before first match
        const ptrdiff_t offset = match - src;

        memcpy(dest, src, offset);           // Copy before match
        memcpy(dest + offset, new, newlen);  // Replace
    
        src  += offset + origlen; // Move src past what we have already copied.
        dest += offset + newlen;  // Advance pointer to dest to the end

        ret++;
    } while(n > ret);

END:
    strcpy(dest, src); // Copy whatever is remaining
    
    return ret;
}

It's easy to write a wrapper for the allocation. We borrow and modify some code from find the count of substring in string

size_t countOccurrences(const char *str, const char *substr) {
    
    if(strlen(substr) == 0) return 0;
    
    size_t count = 0;
    const size_t len = strlen(substr);

    while((str = strstr(str, substr))) {
       count++;
       str+=len // We're standing at the match, so we need to advance
    }

    return count;
}

Then some code to calculate buffer size

size_t calculateBufferLength(const char *src, const char *orig, 
                     const char *new, size_t n) {
    const size_t origlen = strlen(orig);
    const size_t newlen  = strlen(new);
    const size_t baselen  = strlen(src) + 1;

    if(origlen > newlen) return srclen;

    n = n < count ? n : count; // Min of n and count
    
    return baselen +
    n * (newlen - origlen);
}

And the final function. It combines allocation and replacement. It returns a pointer to the buffer, and NULL if allocation fails.

char *replaceAndAllocate(const char *src, const char *orig, 
                          const char *new, size_t n) {
    const size_t count = countOccurrences(src, orig);

    const size_t size = calculateBufferLength(src, orig, new, n);
    char *buf = malloc(size);

    if(buf) replace(buf, src, orig, new, n);

    return buf;
}

And finally, a simple main with a few test cases

int main(void) {
    puts(replaceAndAllocate("hoho", "ha", "he", SIZE_MAX )); 
    puts(replaceAndAllocate("", "", "", 5));
    puts(replaceAndAllocate("", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "", 5));
    puts(replaceAndAllocate("", "", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 5));
    puts(replaceAndAllocate("hihihi!!!", "hi", "of", 2));
    puts(replaceAndAllocate("!!!hihihi", "hi", "x", 3));
    puts(replaceAndAllocate("asdfasdfasdf", "asdf", "x", 2));
    puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "y", SIZE_MAX ));
    puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "y", 0));
    puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "y", 1));
    puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "", SIZE_MAX ));
    puts(replaceAndAllocate("xxxxxxxxxxxx", "x", "", 3 ));
    puts(replaceAndAllocate("!asdf!asdf!asdf!", "asdf", "asdf#asdf", SIZE_MAX));

    // Yes, I skipped freeing the buffers to save some space
}

No warnings with -Wall -Wextra -pedantic and the output is:

$ ./a.out 
hoho



ofofhi!!!
!!!xxx
xxasdf
yyyyyyyyyyyy
xxxxxxxxxxxx
yxxxxxxxxxxx

xxxxxxxxx
!asdf#asdf!asdf#asdf!asdf#asdf!

Note that I don't have any special functions for replacing one and replacing all. If you really want those, just write wrappers with n=1 or n=SIZE_MAX. Using SIZE_MAX is safe, because a string cannot be bigger than that.

Another reason that I got rid of a special function for one replacement is that it was very ineffecient. Also, it was easier to write it that way and it is much cleaner.

I changed the code a lot from last time, and that's very much thanks to the awesome help I got at Codereview. You can see how the code was before on the question I posted there: https://codereview.stackexchange.com/q/263785/133688

klutt
  • 30,332
  • 17
  • 55
  • 95
  • This code didn't work for me but it seems a nice implementation! Using a `while` and `strstr` was my first idea but it entered an endless loop since I replaced **"** for **\"**. – bleudoutremer Jul 04 '21 at 17:03
  • @Vitorbnc There was a lot of issues, but I think they are corrected now – klutt Jul 04 '21 at 23:09
  • Why `goto` instead of `break`? – Deduplicator Jul 06 '21 at 20:46
  • @Deduplicator In this case I think it's clearer. Especially since I already have another `goto` earlier. Granted, in this case it would be equivalent. But it's ideomatic and instantly tells the reader that if it's a null pointer, then we should go to the final part of the code and exit. It's a bit like exceptions. – klutt Jul 06 '21 at 20:49
  • The `goto` is easily averted: https://coliru.stacked-crooked.com/a/8b85dd8f5c624285 – Deduplicator Jul 06 '21 at 20:58
  • @Deduplicator After correcting the typos I tried it, and I failed some of the test cases. – klutt Jul 06 '21 at 20:59
  • @Deduplicator If it can be written cleaner, then it's all good. But I see no point in getting rid of the goto just for the sake of it. This is a perfect example of when goto is good. – klutt Jul 06 '21 at 21:02
  • Fixed code (I think): https://coliru.stacked-crooked.com/a/5c9cd29f40fe673e If you actually want to handle `!orig` and `!new` somehow (I wouldn't), that must be done before using them the first time. – Deduplicator Jul 06 '21 at 21:04
  • That was a good point. Fixed it. But your code still fails. You know, you could try it before suggesting a change. Especially since I have provided nice code to do that :) – klutt Jul 06 '21 at 21:14
  • https://coliru.stacked-crooked.com/a/5edd446e7b38e46c – Deduplicator Jul 07 '21 at 01:23
  • @Deduplicator That seemed to do the trick. But I'm not really sure what you're trying to achieve. Is it "removing goto at all costs"? TBH, I think your code is less clear. You have a very complex condition in the for loop, and the check for `origlen` does not even belong there. It should be done before the loop, because it should be checked only once. In general, I prefer handling edge cases before the loop that does the main work. Sure, it does work because of logic, but it reduces readability. I'm also not very keen on the condition `n-- && ((match = strstr(src, orig)))`. Too complex. – klutt Jul 07 '21 at 07:32
  • Well, pushing `origlen` into the condition of that loop might have been a bit too terse, yes, should really be an enclosing if. But the rest of the condition is really simple enough. Anyway, have fun. – Deduplicator Jul 07 '21 at 10:09
  • @Deduplicator It's not OVERLY complex. I don't really have any complains about it. It's just a personal preference. In general, I want conditions to not have side effects. But of course I do exceptions too. Like `while(*str++);` to get a pointer to the terminator. – klutt Jul 07 '21 at 10:28