How to count the number of distinct characters in common between two strings?

Question

How can a program count the number of distinct characters in common between two strings?

For example, if s1="connect" and s2="rectangle", the count is being displayed as 5 but the correct answer is 4; repeating characters must be counted only once. How can I modify this code so that the count is correct?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    int i,j,count=0;
    char s1[100],s2[100];
    scanf("%s",s1);//string 1 is inputted
    scanf("%s",s2);//string 2 is taken as input
    for(i=1;i<strlen(s1);i++)
    {
        for(j=1;j<strlen(s2);j++)
        {
            if(s1[i]==s2[j])//compare each char of both the strings to find common  letters
            {
                count++;//count the common  letters
                break;
            }
        }

    }
    printf("%d",count);//display the count

}

The program is to take two strings as input and display the count of the common characters in those strings. Please let me know what's the problem with this code.

I wonder that you start both loops with 1. In C, array indexing always start with 0. I wonder especially because your `for` conditions tests correctly for `< strlen()` i.e. stopping after `strlen() - 1`. — Scheff's Cat, Jul 28 '18 at 14:55
you need to more precisely define you problem. what *exactly* is the problem you trying to solve? why is the correct answer 4 and not 5? *the program is to take two strings as input and disiplay the count of the common characters in those strings* For instance, how are you supposed to handle the double "n" in connect? — MFisherKDX, Jul 28 '18 at 14:58
After having written a program and ensured that it's compiling without errors (and at best without warnings), the next step is to check whether the behavior is as expected. From my experience, mostly, it is not or it is not quite. This is the time to use the debugger and find out where the algorithm does not match the expectations. So, you should now google/learn how to use a debugger to find out what's going on/wrong. — Scheff's Cat, Jul 28 '18 at 15:01
sorry both i and j must start with zero but still im unable to get the correct output — lakshmi chandhana, Jul 28 '18 at 15:03
@Scheff, I suspect debugging the algorithm is not the OP's problem. It's the algorithm itself. The OP is not handling the double "n" in connect. The "correctness" depends on the precise definition of *common characters in those strings*. — MFisherKDX, Jul 28 '18 at 15:11
@MFisherKDX Yeah, in general you are right. In some cases, the debugger taught me that I had wrong expectations/imaginations about the algorithm. Everything felt well as long as I ran it in my head but after seeing numbers black on white, I saw my fault (in design)... ;-) In other words, the debugger is good as well to show you that the wrong algorithm is implemented properly. — Scheff's Cat, Jul 28 '18 at 15:15

score 2 · Answer 1 · answered Jul 28 '18 at 16:09

If repeating characters must be ignored, the program must 'remember' the character which were already encountered. You could do this by storing the characters which were processed into a character array and then consult this array while processing the other characters.

You could use a counter variable to keep track of the number of common characters like

int ctr=0;
char s1[100]="connect", s2[100]="rectangle", t[100]="";

Here, t is the character array where the examined characters will be stored. Its size is made to be same as the size of the largest of the other 2 character arrays.

Now use a loop like

for(int i=0; s1[i]; ++i)
{
    if(strchr(t, s1[i])==NULL && strchr(s2, s1[i])!=NULL)
    {
        t[ctr++]=s1[i];
        t[ctr]=0;
    }
}

t initially has an empty string. Characters which were previously absent in t are added to it via the body of the loop which will be executed only if the character being examined (ie, s1[i]) is not in t but is present in the other string (ie, s2).

strchr() is a function with a prototype

char *strchr( const char *str, int c );

strchr() finds the first occurrence of c in the string pointed to by str. It returns NULL if c is not present in str.

Your usage of scanf() may cause trouble.

Use

scanf("%99s",s1);

(where 99 is one less than the size of the array s1) instead of

scanf("%s",s1);

to prevent overflow problems. And check the return value of scanf() and see if it's 1. scanf() returns the number of successful assignment that it made.

Or use fgets() to read the string.

Read this post to see more about this.

And note that array indexing starts from 0. So in your loops, the first character of the strings are not checked.

So it should've been something like

for(i=0;i<strlen(s1);i++)

instead of

for(i=1;i<strlen(s1);i++)

Jonathan Leffler · Answer 2 · 2018-07-28T20:01:41.210

Here's a solution that avoids quadratic O(N²) or cubic O(N³) time algorithms — it is linear time, requiring one access to each character in each of the input strings. The code uses a pair of constant strings rather than demanding user input; an alternative might take two arguments from the command line and compare those.

#include <limits.h>
#include <stdio.h>

int main(void)
{
    int count = 0;
    char bytes[UCHAR_MAX + 1] = { 0 };
    char s1[100] = "connect";
    char s2[100] = "rectangle";

    for (int i = 0; s1[i] != '\0'; i++)
        bytes[(unsigned char)s1[i]] = 1;

    for (int j = 0; s2[j] != '\0'; j++)
    {
        int k = (unsigned char)s2[j];
        if (bytes[k] == 1)
        {
            bytes[k] = 0;
            count++;
        }
    }

    printf("%d\n",count);
    return 0;
}

The first loop records which characters are present in s1 by setting an appropriate element of the bytes array to 1. It doesn't matter whether there are repeated characters in the string.

The second loop detects when a character in s2 was in s1 and has not been seen before in s2, and then both increments count and marks the character as 'no longer relevant' by setting the entry in bytes back to 0.

At the end, it prints the count — 4 (with a newline at the end).

The use of (unsigned char) casts is necessary in case the plain char type on the platform is a signed type and any of the bytes in the input strings are in the range 0x80..0xFF (equivalent to -128..-1 if the char type is signed). Using negative subscripts would not lead to happiness. The code does also assume that you're working with a single-byte code set, not a multi-byte code set (such as UTF-8). Counts will be off if you are dealing with multi-byte characters.

The code in the question is at minimum a quadratic algorithm because for each character in s1, it could step through all the characters in s2 only to find that it doesn't occur. That alone requires O(N²) time. Both loops also use a condition based on strlen(s1) or strlen(s2), and if the optimizer does not recognize that the value returned is the same each time, then the code could scan each string on each iteration of each loop.

Similarly, the code in the other two answers as I type (Answer 1 and Answer 2) are also quadratic or worse because of their loop structures.

At the scale of 100 characters in each string, you probably won't readily spot the difference, especially not in a single iteration of the counting. If the strings were bigger — thousands or millions of bytes — and the counts were performed repeatedly, then the difference between the linear and quadratic (or worse) algorithms would be much bigger and more easily detected.

I've also played marginally fast'n'loose with the Big-O notation. I'm assuming that N is the size of the strings, and they're sufficiently similar in size that treating N₁ (the length of s1) as approximately equal to N₂ (the length of s2) isn't going to be a major problem. The 'quadratic' algorithms might be more formally expressed as O(N₁•N₂) whereas the linear algorithm is O(N₁+N₂).

L_J · Answer 3 · 2018-07-28T15:28:04.823

0

Based on what you expect as output you should keep track which char you used from the second string. You can achieve this as follows:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    int i, j, count = 0, skeep;
    char s1[100], s2[100], s2Used[100]{0};
    scanf("%s", s1);    //string 1 is inputted
    scanf("%s", s2);    //string 2 is taken as input
    for (i = 0; i<strlen(s1); i++)
    {
        skeep = 0;
        for (j = 0; j < i; j++)
        {
            if (s1[j] == s1[i])
            {
                skeep = 1;
                break;
            }
        }

        if (skeep)
            continue;

        for (j = 0; j<strlen(s2); j++)
        {
            if (s1[i] == s2[j] && s2Used[j] == 0)   //compare each char of both the strings to find common  letters
            {
                //printf("%c\n", s1[i]);
                s2Used[j] = 1;
                count++;//count the common  letters
                break;
            }
        }

    }
    printf("%d", count);//display the count
}

edited Jul 28 '18 at 15:28

answered Jul 28 '18 at 15:12

L_J

2,351
10
23
28

How many common characters do "here" and "there" share? I can't tell from the problem description what the expected output should be. – MFisherKDX Jul 28 '18 at 15:15
@MFisherKDX just look at the title – L_J Jul 28 '18 at 15:16
The title doesn't tell me the expected output of the words "here" and "there". ;) – MFisherKDX Jul 28 '18 at 15:17
the answer must be 3 it should not take the repeating letters – lakshmi chandhana Jul 28 '18 at 15:17
2

@lakshmichandhana ... now we are starting to get somewhere. ;) I suggest you edit your question to define the problem more clearly. Then you will get some suggestions on how to modify your algorithm to get the expected behavior. :) – MFisherKDX Jul 28 '18 at 15:20
@lakshmichandhana I modified codes to fit your problem – L_J Jul 28 '18 at 15:29
1

Which C compiler accepts `char …, s2Used[100]{0};`? That's more like a C++1x initializer construct, isn't it? – Jonathan Leffler Jul 28 '18 at 19:27
@JonathanLeffler Yeah, that's true – L_J Jul 28 '18 at 19:40

How to count the number of distinct characters in common between two strings?

3 Answers3