0

I'm trying to make a function in c that takes a string as an input and returns the same string, but with the "&", ">", "<" replaced with "& amp;", "& lt;" and "& gt;" (Excluding whitespace).

I'm struggling to understand how I can do this.

I have tried to run the string through a loop and then compare each character in the string with the symbol using strcmp in order to compare. And if the character is the same, try to replace the character with the corresponding entities.

Some code to show what I've been trying:

#include <stdio.h>
#include <string.h>

char *replace_character(char *str) {
  for(size_t i = 0; i <= strlen(str); i++) {
    if(strcmp(str[i], '&') {
      str[i] = "&amp;";
    }
    ... (same procedure for the rest of the characters)
  }
  return str;
}


int main() {
 char with_symbol[] = "this & that";

 printf(replace_character(with_symbol));
}

Expected result: "This &amp that"

Svele
  • 13
  • 5
  • 3
    You can't replace the characters in place (because the lengths are different). See [this question](https://stackoverflow.com/questions/779875/what-is-the-function-to-replace-string-in-c) for some possible solutions. – Federico klez Culloca Sep 12 '19 at 10:53
  • You have to allocate new memory with enough size for the new string. Then coy all the content into the new memory. Remember to `free` it. – KamilCuk Sep 12 '19 at 11:07
  • @KamilCuk Could this be solved by giving the array an bigger predefined size e.g( char with_symbol[50] = ....) ? – Svele Sep 12 '19 at 11:24
  • It could be solved that way. And "bigger size" must be enough to hold all characters. Then the operation `str[i] = "&";` becomes "move all characters behind `i` 5 bytes to the left (memmove) and copy `"$amp;"` character into the position `i` (memcpy)". – KamilCuk Sep 12 '19 at 11:26
  • @KamilCuk I'm afraid I'm not quite following this. If the first if-statement hits (strcmp(str[i], '&' == 0), do I need to do the memmove and memcpy call or is this implicit done with the statement that is already written? – Svele Sep 12 '19 at 11:34
  • `implicit done` - this is C. Nothing is implicitly done. Also `strcmp(str[i], '&'` is invalid, you can't compare characters with `strcmp`. – KamilCuk Sep 12 '19 at 13:43

1 Answers1

1

The concept of string in C is a low-level one: an array of characters. Just as you cannot take an array of integers and directly replace one of its integers with a whole other array, you cannot directly replace a character of a string with another string. You must first allocate the necessary memory for the extra characters that you want to jam into your original string.

Below I offer a code that will do that. It isn't the most efficient, but gives you an idea of how this should work. It is inefficient because it first goes through the whole string counting the special symbols that are going to be replaced and figuring out how much extra space is needed, then it goes over it again when it copies the characters.

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *replace(const char *s)
{
    size_t i, j;
    size_t len, extra;
    char *r = NULL;

    len = strlen(s);
    extra = 0;

    /* First we count how much extra space we need */
    for (i = 0; i < len; ++i) {
        if (s[i] == '&')
            extra += strlen("&amp;") - 1;
        else if (s[i] == '<')
            extra += strlen("&lt;") - 1;
        else if (s[i] == '>')
            extra += strlen("&gt;") - 1;
    }

    /* Allocate a new string with the extra space */
    r = malloc(len + extra + 1);
    assert(r != NULL);

    /* Put in the extra characters */
    j = 0;
    for (i = 0; i < len; ++i) {
        if (s[i] == '&') {
            r[j++] = '&';
            r[j++] = 'a';
            r[j++] = 'm';
            r[j++] = 'p';
            r[j++] = ';';
        } else if (s[i] == '<') {
            r[j++] = '&';
            r[j++] = 'l';
            r[j++] = 't';
            r[j++] = ';';
        } else if (s[i] == '>') {
            r[j++] = '&';
            r[j++] = 'g';
            r[j++] = 't';
            r[j++] = ';';
        } else {
            r[j++] = s[i];
        }
    }

    /* Mark the end of the new string */
    r[j] = '\0';

    /* Just to make sure nothing fishy happened */
    assert(strlen(r) == len + extra);

    return r;
}

int main(void)
{
    const char *sorig = "this &, this >, and this < are special characters";
    char *snew;

    snew = replace(sorig);

    printf("original  :  %s\n", sorig);
    printf("     new  :  %s\n", snew);

    free(snew);

    return 0;
}

A better strategy would be to define a lookup table or map so that you can include or exclude new pairs of symbols and their replacements just by changing the table. You can also use strncpy for this, avoiding the character by character treatment. The example above is just to illustrate what goes on under the hood.

gustgr
  • 152
  • 9
  • 2
    Thank you for the comment and the example code. However, in this code, you are, for example, not removing the ">" and "<" when the entities are replaced. Ending up with "<<" instead of "<" – Svele Sep 12 '19 at 13:22
  • 1
    That is true. I had misunderstood. The code required minimal changes though. It has been updated. – gustgr Sep 12 '19 at 13:41
  • This is now giving "malloc(): corrupted top size" error. – Svele Sep 12 '19 at 13:50
  • Could you please show the input that caused such error? – gustgr Sep 12 '19 at 13:56
  • 1
    EDIT: Sorry, there was one another line which was not removed. It does work now! – Svele Sep 12 '19 at 13:58
  • What compiler are you using? And compiling flags? To code is fully compliant with ANSI C so you shouldn't have a problem. – gustgr Sep 12 '19 at 13:59
  • See my edited comment, thank you for your response gustgr! – Svele Sep 12 '19 at 14:01
  • Good to hear. Mind you this is not production code, just an example to show you how C handles strings. – gustgr Sep 12 '19 at 14:02