Inplace string replacement in C

Question

Write a function

void inplace(char *str, 
             const char pattern, 
             const char* replacement, 
             size_t mlen)

Input:
str: a string ending with \0. the input indicates that we need an inplace algorithm.

pattern: a letter.

replacement: a string.

mlen: the size of the memory holds the string str starts from the beginning of the memory and that mlen should be larger than strlen(str)

The final result is still pointed by str.

Note that all occurrence of pattern should be replaced.

For example,

helelo\0...........

Here "helelo" is the string to replace with '\0' at the end. After '\0' there are still L valid bytes. We want to replace "e" by "123".

A simple approach works like this, we go through str, when a pattern is matched, we shift all the rest with the place to fill the replacement string, then replace the pattern by the replacement.

If the original string is with length n and contains only e, we need (n-1) + (n-2) + ... + 1 shifts.

Is there an algorithm that scans the string with only one pass and constant memory cost?

"If the original string is with length n and contains only e, we need 2(n-1) + 2(n-2) + ... + 2 shifts". No, that's not correct. Each letter is only shifted once. Example: "abcdef". Shift right one letter means, Copy 'f' one char down, copy 'e' one char down, etc. You are working from the end of the string. It does not mean continuously scanning from the front of the string as you have implied. — kaylum, Sep 01 '15 at 00:59
If the original string is eeeec, the new string should be 123123123123c. If we know the length of the new string, we can move c directly to the final position, then add 123 in front one by one. W/o knowing the length, when the first e is matched, we shift all the rest eeec 2 bytes right, this needs 4 moves. When we meet the second e, we need another 3 moves.. — Joe C, Sep 01 '15 at 01:07
Ok, you are right. But that's because your question is still not clearly specified (even though you have started a new question). Well, at least not clear to me that you wanted to replace all occurences (yes, I probably should have assumed that - but that's why requirements should always be clearly and explicitly specified and not leave room for assumptions). — kaylum, Sep 01 '15 at 01:10

user3386109 · Accepted Answer · 2015-09-01T17:17:09.367

4

I think two passes is the minimum. On the first pass, count the number of characters that will be replaced. Given that count and the length of the replacement string, you can compute the length of the final string. (And you should verify that it's going to fit into the buffer.)

On the second pass, you scan the string backwards (starting at the last character), copying characters to their final location. When you encounter the search character, copy the replacement string to that location.

In your example, the increase in length would be 2. So you would

copy str[5] which is '\0' to str[7]
copy str[4] which is 'o' to str[6]
copy str[3] which is 'l' to str[5]
copy str[2] which is 'l' to str[4]
at str[1] you find the 'e' so str[3]='3' str[2]='2' str[1]='1'

At this point the output index is the same as the input index, so you can break the loop.

As @chux pointed out in the comments, the cases where the replacement string is either empty, or has exactly one character, can be handled with a single forward pass through the string. So the code should handle those cases separately.

edited Sep 01 '15 at 17:17

answered Sep 01 '15 at 01:11

user3386109

34,287
7
49
68

I have a question here. You are assuming while scanning the string backwards, we would need to shift the characters by the length of replacement string. However, if consecutive 'e' are present then this would not be the case, then we would need to shift by the consecutive 'e' times the length of replacement string. Since you are scanning backwards, there is no way of knowing that the 'e' are consecutive or not. We are just aware of the count of 'e'. – Sumit Trehan Sep 01 '15 at 06:43
@SumitTrehan It doesn't matter whether the 'e' are consecutive or not. In the example where the replacement string has length 3, each 'e' adds 2 to the length of the string, since one character 'e' is being replaced by three "123". – user3386109 Sep 01 '15 at 06:50
I assume you mean to say that we can compute the length of the final string, then we should move the last character to the computed length-1, then carry on the shifting for every other character! – Sumit Trehan Sep 01 '15 at 06:55
@SumitTrehan Yes, *"compute the length of the final string"*, *"copy characters to their final location"* is what I said. – user3386109 Sep 01 '15 at 06:58
Disagree with "two passes is the minimum" as demonstrated with this [answer](http://stackoverflow.com/a/32322837/2410359). Unless I was obliged to use a single pass solution, this answer does outline a good approach. Corner weakness: the order of copying depends on the length of the pattern replacement. If `strlen(replacement) >= 1` this answer works. But is `*replacement == 0`, on the second pass, code should scan/replace the string forwards. – chux - Reinstate Monica Sep 01 '15 at 16:46
@chux I agree that the corner cases where `strlen(replacement)` is 0 or 1 can be handled with a single forward pass. I have reviewed your answer and disagree that your algorithm is single pass. I may post a comment on your answer explaining. – user3386109 Sep 01 '15 at 16:54

score 1 · Answer 2 · answered Sep 01 '15 at 02:57

1

A candidate single pass solution.

For each character in str, recurse. After the recursion, do the replacement.

It does recurse heavily.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>

// return 0:success else 1:fail
static int inplace_help(char *dest, const char *src, int pattern,
        const char* replacement, size_t rlen, size_t mlen) {
  printf("'%p' '%s' %c\n", dest, src, pattern);
  if (*src == pattern) {
    if (rlen > mlen) return 1;
    if (inplace_help(dest + rlen, src + 1, pattern, replacement, rlen,
            mlen - rlen)) return 1;
    memcpy(dest, replacement, rlen);
    return 0;
  }
  if (mlen == 0) return 1;
  int replace1 = *src;
  if (*src) {
    if (inplace_help(dest + 1, src + 1, pattern, replacement, rlen, mlen - 1)) {
      return 1;
    }
  }
  *dest = replace1;
  return 0;
}

void inplace(char *str, const char pattern, const char* replacement,
        size_t mlen) {
  if (pattern == 0) return;
  if (mlen == 0) return;
  if (*replacement == 0) return;  // Insure str does not shrink.
  inplace_help(str, str, pattern, replacement, strlen(replacement), mlen - 1);
}

int main(void) {
  char str[1000] = "eeeeec";
  inplace(str, 'e', "1234", sizeof str);
  printf("'%s'\n", str);  // --> '12341234123412341234c'
  return 0;
}

answered Sep 01 '15 at 02:57

chux - Reinstate Monica

143,097
13
135
256

Interesting design. It uses one pass indeed, although its real memory cost is non constant because of recursion. – Joe C Sep 01 '15 at 16:27
Actually the strlen called in the very beginning needs one pass to return value. Can we do it without calling strlen? – Joe C Sep 01 '15 at 16:29
@Joe C 1) The memory need is proportional to the `strlen(str)`. 2) It is still one-pass. 1 pass through `str` with `inplace_help()` and 1 pass though `replacement` with `strlen(replacement)`. It is not 2-passes though the same string. – chux - Reinstate Monica Sep 01 '15 at 16:35
@Joe 3) This is an example where 1-pass _can_ be done, yet a 2-pass as suggested by others is likely more practical. It really is not the number of passes that is important, but the order of complexity, `O()`, of the algorithm in execution and memory usage. I did this to show how 1-pass could occur. – chux - Reinstate Monica Sep 01 '15 at 16:36
@chux I disagree that this is a single pass algorithm. On the way down, the line `int replace1 = *src;` copies the entire string to the stack one character at a time (with about 32 bytes of additional stack memory overhead per character). On the way back up, the line `*dest = replace1;` copies the saved characters back to the original string. So I see this algorithm as equivalent to [this answer](https://stackoverflow.com/a/32322787/3386109) by B.Nadolson that constructs the new string in a working buffer, and then copies the final string back to the original string. – user3386109 Sep 01 '15 at 17:29
1

@user3386109 Code like `for(i=0;i – chux - Reinstate Monica Sep 01 '15 at 17:53
@chux I certainly agree that the algorithm is O(n) time and O(n) space. So the time complexity is the best it can be. I didn't comment yesterday since you said right up front, "It *does* recurse heavily." So I figured that this answer is mostly for fun, and I'll leave it at that :) – user3386109 Sep 01 '15 at 18:19

score 0 · Answer 3 · answered Sep 01 '15 at 05:47

The following assumes that the memory allocated to the string has been initialized to something at some point in time, since standard C does not seem to allow access to uninitialized memory. In practice, it will work fine.

It does precisely two scans: the first one is over the entire allocated space, and moves the string to the right-hand edge of the space. The second scan is over the string itself, which it moves back to the left-hand edge while it does replacements.

I changed the prototype to return 0 on success; -1 on failure. I also allow the pattern to be a string. (Maybe a single character was intentional? Easy to change, anyway.) (As written, pattern must not be length zero. Should be checked.)

int inplace(char *str, 
            const char* pattern, 
            const char* replacement, 
            size_t mlen) {
  /* We don't know how long the string is, but we know that it ends
     with a NUL byte, so every time we hit a NUL byte, we reset
     the output pointer.
   */
  char* left = str + mlen;
  char* right = left;
  while (left > str) {
    if (!*--left) right = str + mlen;
    *--right = *left;
  }

  /* Naive left-to-right scan. KMP or BM would be more efficient. */

  size_t patlen = strlen(pattern);
  size_t replen = strlen(replacement);
  for (;;) {
    if (0 == strncmp(pattern, right, patlen)) {
      right += patlen;
      if (right - left < replen) return -1;
      memcpy(left, replacement, replen);
      left += replen;
    } else {
      if (!(*left++ = *right++)) break;
    }
  }
  return 0;
}

I think c only does not allow reading from uninitialized value. It should be fine if we write before read. — Joe C, Sep 01 '15 at 16:32
@JoeC: That's correct. However, the above code does read before it writes, since it reads the entire allocated region for the given string, and not just the bytes which form part of the string. That would be fine if the region had been originally allocated with `calloc`, but it might not be fine if the region had been originally allocated with `malloc`. You would only notice the issue if you ran the code with a tool like valgrind which detects uninitialized reads. — rici, Sep 01 '15 at 16:49

Inplace string replacement in C

3 Answers3

Linked