2

I'm trying to develop a function in a library that looks for the caracters '<', '>' and '&'; and substitutes them for &lt;, &gt; and &amp;.

So far, I've developed this function:

char *sXMLspecialReplace(char *rawString) {
  char *old_pointer = rawString;
  int new_index = 0;
  /*In most cases there won't be character substitution, so we can
  start allocating with the original length and only reallocate if
  a special char is spotted, therefore, memory reallocation will
  take place in few circumstances.
  + 1 for NULL terminator.*/
  int result_size = strlen(rawString) + 1;
  char *result = (char *) malloc(result_size);
  do {
    switch (*old_pointer) {
      case '&':
        result_size += 4; // From 1 to 5
        result = (char *) realloc(result, result_size);
        strcpy(result + new_index, "&amp;");
        new_index += 5;
        break;
      case '<':
        result_size += 3; // From 1 to 4
        result = (char *) realloc(result, result_size);
        strcpy(result + new_index, "&lt;");
        new_index += 4;
        break;
      case '>':
        result_size += 3; // From 1 to 4
        result = (char *) realloc(result, result_size);
        strcpy(result + new_index, "&gt;");
        new_index += 4;
        break;
      default:
        //Most frequent case
        result[new_index++] = *old_pointer;
    }
    old_pointer++;
  } while (*old_pointer != '\0');
  /*do-while ensures the function exits with a null terminator*/
  return result;
}

While it works in some cases, there are others when it doesn't work. In a test I tried to do the following:

char *my_company = "HELLO INDUSTRIES EUROPE";
char *xml_company = sXMLspecialReplace(my_company);

sprintf(sXMLanswer,
  "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
  "<company>%s</company>\n",
  xml_company);

printf(sXMLanswer);

OUTPUT

<?xml version="1.0" encoding="UTF-8"?>
<company>HELLO INDUSTRIES EUROPE[[[[[</company>

EDIT 1: format &lt;, &gt; and &amp;.

ArthurTheLearner
  • 315
  • 4
  • 12
  • First of all, please read [this discussion about casting the result of `malloc` (and friends like `realloc`)](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). Secondly, never reassign back to the pointer variable you pass to `realloc`, if `realloc` fails your original pointer will be lost. Thirdly, don't forget that `char` strings in C are really called ***null-terminated** byte strings*. That *null-termination* is important, and requires an extra byte, making a four-character string need space for five characters. – Some programmer dude Mar 09 '18 at 10:05
  • 1
    I agree with the first two points. However, I think the third point is taken into account in the initial `malloc`, which uses the lenght of the parameter `raw_string` + 1. – ArthurTheLearner Mar 09 '18 at 10:12
  • If I'm not mistaken, I do. When the pointer gets to the NULL terminator of `raw_string`, this NULL character is copied into `result`, then the exit condition is met, and the function returns. Am I doing it wrong? – ArthurTheLearner Mar 09 '18 at 10:21
  • The space for the terminating character was included properly: `int result_size = strlen(rawString) + 1;` – Gerhardh Mar 09 '18 at 11:13

2 Answers2

3

You do not nul-terminate your copy of the string.

do
{
  switch (*old_pointer)
  {
   ...
    default:
      //Most frequent case
      result[new_index++] = *old_pointer;
  }
  old_pointer++;
} while (*old_pointer != '\0');

In your switch you copy the characters. Afterwards you advance your source pointer. And then you check if the advanced pointer points to '\0'. This means after the last character of the string you leave the do while loop and do not copy the terminating '\0' byte.

This leaves random garbage after your copy of the string.

You might fix it like this:

  ...
  }
  // No incement here!
} while (*old_pointer++ != '\0');
Gerhardh
  • 11,688
  • 4
  • 17
  • 39
1

It's probably better(1) to do what you want in three separate step :

  • 1 : get the len of the final result
  • 2 : Allocate it
  • 3 : fill it

To get the final len, it's pretty simple : You just iterate over the string and each time you see the character you want to replace, you increase the len accordingly.

To fill it, it's pretty simple too : You iterate over the string again, and if you see a character, you add the substitute text instead of the original char.

Given what you have already did, I think this algorithm will be a piece of cake for you.

If you want to keep it realy basic, you can harcode everything (like you already did), or you can use structure to allow a more evolutif code (like addind another character to replace).

Edit : (1) I say "better" because you may end up doing a lot of realloc/malloc at runtime, which is not really efficient at all. Futhermore, each time you want to add a replacement character (I don't know if XML have other than "&<>"), you will have to do a realloc, which will increase your code for nothing. Imagine having 23 charactere to replace ... Your code will end up in a hell of maintenance.


typedef struct swap {
    char   src;
    char   *dst;
    size_t dstLen;
} swap_s;

swap_s *SwapList_FoundBySrc(swap_s *list, char searched)
{
    for (size_t i = 0; list[i].src; ++i) {
        if (list[i].src == searched) {
            return (&list[i]);
        }
    }
    return (NULL);
}

char *PlainStringToXmlString(char *input)
{
    char   *output = NULL;
    size_t len     = 1; // Final '\0'
    swap_s list[]  = {{'&',  "&amp;", strlen("&amp;")},
                      {'<',  "&lt;",  strlen("&lt;")},
                      {'>',  "&gt;",  strlen("&gt;")},
                      {'\0', NULL,    0}};
    swap_s *swap   = NULL;

    // Get the final len sum
    for (size_t i = 0; input[i]; ++i) {
        if ((swap = SwapList_FoundBySrc(list, input[i]))) {
            len += swap->dstLen;
        } else {
            ++len;
        }
    }

    // Allocate it
    if (!(output = malloc(len))) {
        // Log ?
        return (NULL);
    }
    output[len - 1] = '\0';

    // Fill it
    size_t j = 0;
    for (size_t i = 0; input[i]; ++i) {
        if ((swap = SwapList_FoundBySrc(list, input[i]))) {
            strcpy(output + j, swap->dst);
            j += swap->dstLen;
        } else {
            output[j] = input[i];
            ++j;
        }
    }

    return (output);
}
Tom's
  • 2,448
  • 10
  • 22