2

I need to write a function which will remove all duplicates sub-strings from the string, the function below do than job but not so correct.

Input: This is a simple test for lesson2 Quit lesson2

Output: This simple test for lesson2 Quit

As you can see function remove "is" from the sentence but it is not correct.

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        if (strstr(*output, element) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}

Updated Version of Code (@Rohan)

Input: This is a is is simple test simple for lesson2 Quit

Output: This is a is is simple test simple for lesson2 Quit

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *temp = NULL;
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        temp = strstr(*output, element);
        /* check for space before/after it or '\0' after it. */
        if (temp == NULL || temp[-1] == ' ' || temp[strlen(element)] == ' ' || temp[strlen(element)] == '\0'  ) {

            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}
Viktor Apoyan
  • 10,655
  • 22
  • 85
  • 147

3 Answers3

4

You need to check for a word in element instead of plain string.

What are you getting is that, in your input string there are 2 "is" one is part of "This" while another is actual word "is".

 This is a simple test for lesson2 Quit lesson2
 --^ -^  

strstr() finds both strings, and removes 2nd "is". But you need to find only duplicate words.

You can do that by checking for spaces ' ' before and after the found word. In case its last word check for '\0' at the end.

Try updating your while loop as:

char temp[512] = { 0 }; //use sufficient array
while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        //create word
        sprintf(temp, " %s ", element);
        if(strstr(*output, temp) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }
Rohan
  • 52,392
  • 12
  • 90
  • 87
0

Disclaimer: This will not fix your algorithm, see @Rohans answer for the fix in your algorithm.


To fix your code, do the following:

*output = (char*) malloc(text_size);

... should be:

char *output = malloc(text_size);

... and change:

*output[0] = '\0';

... to be:

output[ 0 ] = '\0';

... do not cast a malloced block of memory. You can read more about this here. Observe that output[ 0 ] implies *( output + 0 ).

Next, change:

strcpy(*output, element);

... to:

strcpy(output, element);

... then change:

if (strstr(*output, element) == NULL) {
  strcat(*output, " " );
  strcat(*output, element );
}

... to:

if (strstr(output, element) == NULL) {
  strcat(output, " " );
  strcat(output, element );
}

... notice that output is already a pointer, using the * as you do dereferences that pointer which returns a character. strstr and strcpy require that dest is a pointer to an array of characters, a character.

Community
  • 1
  • 1
Jacob Pollack
  • 3,703
  • 1
  • 17
  • 39
  • Here's a link to a (pretty well-regarded answer) on [why you should not cast the return value of `malloc()` in C](http://stackoverflow.com/a/605858/28169). – unwind Aug 14 '13 at 07:35
  • @unwind, thank you for the link, included it in my answer -- I didn't want to include a low-level explanation with justification as it would get too off-topic. – Jacob Pollack Aug 14 '13 at 07:42
0

You could try something like this instead

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char test_text[] = "This is a is is simple test simple for lesson2 Quit";

int main(int argc, char* argv[])
{
  const int maxTokens = 200; // lets assume there is max 200 tokens in a sentence
  char* array[maxTokens];    // pointers to strings
  int unique_tokens = 0;     // number of unique tokens found in string

  // first tokenize the string and put it into a structure that is a bit more flexible
  char* element = strtok(test_text," ");
  for (; element != NULL; element = strtok(NULL, " "))
  {
     int foundToken = 0;
     int i;
     // do we have it from before?
     for (i = 0; i < unique_tokens && !foundToken; ++i)
     {
       if ( !strcmp(element, array[i]) )
       {
         foundToken = 1;
       }
     }

     // new token, add
     if ( !foundToken )
     {
       array[unique_tokens++] = (char*)strdup(element); // this allocates space for the element and copies it
     }
  }

  // now recreate the result without the duplicates.

  char result[256] = {0};
  int i;
  for (i = 0; i < unique_tokens; ++i)
  {
    strcat(result,array[i]);
    if ( i < unique_tokens - 1 )
    {
      strcat(result," ");
    }
  }

  puts( result );

  return 0;
}
AndersK
  • 35,813
  • 6
  • 60
  • 86