0

The following code I found on tutorialgateway is used to remove all duplicate characters from a string.

#include <stdio.h>
#include <string.h>

int main()
{
    char str[100];
    int i, j, k;

    printf("\n Please Enter any String :  ");
    gets(str);

    for(i = 0; i < strlen(str); i++)
    {
        for(j = i + 1; str[j] != '\0'; j++)
        {
            if(str[j] == str[i])
            {
                for(k = j; str[k] != '\0'; k++)
                {
                    str[k] = str[k + 1];

                }
            }
        }
    }

    printf("\n The Final String after Removing All Duplicates = %s \n", str);

    return 0;
}

I thought if I changed the == for != on the if statement it would instead remove all unique or non duplicate characters, but instead it does something else. What am I not understanding about how this code works?

  • Please _edit_ your question to clarify. When we "remove" all duplicate chars, given `abbbc`: Do we want `abc` or `ac` as the result? – Craig Estey Feb 01 '22 at 19:08
  • 1
    Ideally remove all chars that appear more than once, so `ac` would be correct. – André Marques Feb 01 '22 at 19:11
  • Hmmm "remove all non duplicate characters from a string." --> as the _null character_ is part of the _string_ (As defined by the C library) - and there is only 1 in a string, I hope you do not want to remove that too. – chux - Reinstate Monica Feb 01 '22 at 19:34
  • @chux-ReinstateMonica I think that is covered on the **for** loop with `str[] != '\0'`, no? – André Marques Feb 01 '22 at 19:48
  • I'd say the code covers not removing the null character. Yet the stated goal is not met even if the reasonable goal is met. IOWs, the problem is not the code, but the goal. Certainly not a big issue with learner code - just watch out for such ambiguities in production assignments. – chux - Reinstate Monica Feb 01 '22 at 19:54

1 Answers1

3

Your solution will take approximately O(n^3).

Also, having three nested loops adds needless complexity.

If we change the algorithm to use a frequency table, we can reduce complexity. And, we can reduce the running time to 2*O(n) or O(n).

Side note: Never use gets. See: Why is the gets function so dangerous that it should not be used?

Here is the refactored code:

#include <stdio.h>
#include <string.h>

int
main()
{
    char str[100];
    char freq[256] = { 0 };
    char *src;
    char *dst;
    unsigned char chr;

    printf("\n Please Enter any String :  ");
    fgets(str,sizeof(str),stdin);
    str[strcspn(str,"\n")] = 0;

    printf("\n The Original String = %s \n", str);

    // get frequency count
    src = str;
    for (chr = *src++;  chr != 0;  chr = *src++) {
        switch (freq[chr]) {
        case 0:  // new [unique] char
            freq[chr] = 1;
            break;

        default:  // duplicate
            freq[chr] = 2;
            break;
        }
    }

    dst = str;
    src = str;

    // copy over chars that are unique
    for (chr = *src++;  chr != 0;  chr = *src++) {
        if (freq[chr] == 1)
            *dst++ = chr;
    }

    *dst = 0;

    printf("\n The Final String after Removing All Duplicates = %s \n", str);

    return 0;
}

Here is the program output:


 Please Enter any String :
 The Original String = abbbbbbbbbccdefghe

 The Final String after Removing All Duplicates = adfgh
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • Nice. Just a comment for the pedantic, _signed_ `char` and non 2's complement encoding (which will likely die on next C update): `char *src; char *dst;` better as `unsigned char *src, *dst;` or the like to properly distinguish +0 from -0 and avoid a trap. – chux - Reinstate Monica Feb 01 '22 at 19:46
  • Thank you for the note on `gets`, I haven't learned about fgets yet but I'll look into that. What do you mean by "O(n^3)"? – André Marques Feb 01 '22 at 19:52
  • That is "big O notation". See: https://en.wikipedia.org/wiki/Big_O_notation Because C doesn't have a "raise to a power" operator, here, `n^3` is "n cubed" or "n to the third power". It is sometimes written as `n**3`, because in some languages that _do_ have power operators, like Fortran, the power operator _is_ `**`. I've seen it both ways. – Craig Estey Feb 01 '22 at 21:09