0

Possible Duplicate:
What is an easy way to tell if a list of words are anagrams of each other?
finding if two words are anagrams of each other

I have below C code written to check if two given strings are anagrams of each other. I know this is worst in terms of complexity/efficiency and there are far many better ways to do it.

#include "stdio.h"

main()
{
char s1[]="mist";
char s2[]="mitt";
int i,j,isanag=0;

if(strlen(s1)!=strlen(s2))
    printf("Not anagrams\n");

for(i=0;i<strlen(s1);i++)
{
    isanag=0;
    for(j=0;j<strlen(s2);j++)
    {
        if(s1[i]==s2[j])
        {
            isanag = 1;
            break;
        }
    }
    if(isanag == 0)
    {
        printf("Not anagrams\n");
        getch();
        exit(0);
    }

}

printf("Yes Anagrams\n");
getch();
}

This works fine and prints Not Anagrams which is correct If I swap the names of two strings as below it gives wrong answer

char s1[]="mitt";
char s2[]="mist";

I know the way the 2 for loops are coded this is obvious.

What can I do to improve this code and resolve this quirk?

Community
  • 1
  • 1
goldenmean
  • 18,376
  • 54
  • 154
  • 211
  • Where did you live the return type of poor `main()`? –  Oct 21 '12 at 12:47
  • 3
    Your algorithm is wrong, because it only checks if all distinct characers of `s1` are present in `s2` but does **not** check if there are characters in `s2` that are not in `s1`. That's why `"mitt"` is reported as an anagram of `"mist"`; the `'s'` in `s2` is just ignored. Your program also fails to detect different number of repetated letters, i.e., `"mistmt"` vs `"mist"` – C2H5OH Oct 21 '12 at 12:50
  • Your algorithm is among the worst possible. Its complexity is `O(n^2 * m^2)` where n & m are lengths. Check out the dup for better answer. – P.P Oct 21 '12 at 12:55
  • @C2H5OH Are you stealing my name? :D –  Oct 21 '12 at 13:31
  • @KingsIndian: If n and m are lengths of 2 strings, I dont think complexity of this one is O(n^2*m^2), i see this is O(n*m). Care to elaborate. – goldenmean Oct 21 '12 at 13:58
  • @goldenmean because you are calling `strlen` for every iteration in both loops, which is equal to another iteration unless compiler helps you. – P.P Oct 21 '12 at 14:00
  • @Kingsindian: I agree but the strlen() call would be outside. And since that would be constants which is m and n so it should be O(n*m).. – goldenmean Oct 21 '12 at 14:09
  • @goldenmean How does it return *constant* ? `strlen` has to loop through the array to give the length. Unless compiler optimizes it, it's going to do the same for every iteration. – P.P Oct 21 '12 at 14:11

5 Answers5

6

Considering only lowercase letter you can make 2 vectors of lenght 26 each.

set all positions to 0 in both of them, make a loop in the first string (s1) and increment the positions in the vector:

   int v1[26], v2[26], i;
   for( i = 0; i < 26; i++)
        v1[i] = v2[i] = 0;
   for(i = 0; i < strlen(s1); i++)
        v1[s1[i] - 'a']++; // 'a' would be on position 0, 'b' on position 1 and so on
   for(i = 0; i < strlen(s2); i++)
        v2[s2[i] - 'a']++; 

After it you just loop in the vectors and see if the amount of letters is different

   for(i = 0; i < 26; i++)
        if(v1[i] != v2[i])
        {
            printf("Not anagrams");
            exit(0);
        }
    printf("Anagrams");

but if you use uppercase you can make 4 vectors, and for the new one subtract by 'A' or make a bigger vector and put some if's in your code... I'll let you try that one ;)

vmp
  • 2,370
  • 1
  • 13
  • 17
  • 1
    good solution. If you change the 26 in 256 and lose the - 'a', that will work comparing any string of bytes, capitals included. – Jeroen Vuurens Oct 21 '12 at 13:01
  • You could also use `size_t` as the count type, since there do exist systems on which it's possible to create an array larger than the max value of `int`. You don't want to report a string containing 2^32 "a"s as an anagram of a string containing 2^32 "b"s (or of an empty string). – Steve Jessop Oct 21 '12 at 13:43
4

@Goldenmean, here's another solution which sorts the two strings and then compares them. Character counting as others have discussed will be more efficient, but this is more interesting :-)

qsort is a standard-library sorting function that takes the function comp as a parameter and uses it to compare elements of a list (a string in this case) as it re-orders the list.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

static int comp(const void *a, const void *b)
{
    const char *pa = a;
    const char *pb = b;

    return
        (*pa > *pb) ?  1 :
        (*pa < *pb) ? -1 :
        0;
}

int main(int argc, char ** argv)
{
    char s1[]="mits";
    char s2[]="mist";

    qsort(s1, strlen(s1), 1, comp);
    qsort(s2, strlen(s2), 1, comp);

    printf("%s : %s  - %s\n", s1, s2, strcmp(s1, s2) ? "No" : "Yes");
    return 0;
}
William Morris
  • 3,554
  • 2
  • 23
  • 24
  • 1
    "this is more interesting" +1 because as well as being more interesting, it doesn't make any particular assumption about the alphabet (well, for a Unicode alphabet it assumes canonicalization). The "for lower-case strings only and assuming ASCII so that `s[i] - 'a'` is in range 0 to 25)" is a fine hack, but it's nice to have a reasonably-fast and more-general fallback. – Steve Jessop Oct 21 '12 at 13:31
3

Based on vmp's solution you can do this with one char array[26].

  1. Iterate over the first string, increment the array element for the letter by one.
  2. Iterate over the second string, decrement the array by one.
  3. Iterate over the alphabet array and assert all elements are zero.

Edit: Added some code (no compiler at hand, but concept should be ok)

//lower case only
int isAnagram( char* left, char* right)
{
   char theAlphabet[26];

   memset( theAlphabet, 0, sizeof theAlphabet );


   int length = strlen( left );

   for( int i=0; ++i; i < length )
   {
      if ( 0 == right[i] )
      { //mismatching length
        return 0; 
      }

     ++theAlphabet[ left[i] - 'a' ];
     --theAlphabet[ right[ i ] - 'a' ];

   }

   if ( left[length] != 0
       || right[length] != 0 )
   {
     return 0;
   }


   for( int j=0; ++j; j < 26 )
   {
      if ( 0 != theAlphabet[j] )
      {
        return 0;
      }
   }

   return 1; //yes it is an anagram
}
Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
Mario The Spoon
  • 4,799
  • 1
  • 24
  • 36
2

You haven't coded for the possibility of repeated letters.

Take these two strings:

char s1[]="mitt";
char s2[]="mist";

For each letter in the first string, your code checks each letter in the second string to see if there's an identical letter, anywhere. So let's go through the first string and check for the first identical letter in the second (this is what your code does):

s1[0] ('m'): yes, at s2[0]
s1[1] ('i'): yes, at s2[1]
s1[2] ('t'): yes, at s2[3]
s1[3] ('t'): yes, at s2[3]

As you can see, the code thinks the two strings are anagrams, because it matched two letters in the first string to only one letter in the second!

The solution would be to 'cut out' the letters you've already matched before going onto the next letter; I'll leave that to you to code! Good luck.

EDIT: And, of course, I forgot: if the code successfully manages to go all the way through string 1, cutting out letters from string 2, and there are letters left in string 2, that means that they're not anagrams! (In the example above, 's' would be left in string 2.)

Archimaredes
  • 1,397
  • 12
  • 26
1

I am sorry, but your implementation is flawed. This code:

for(j=0;j<strlen(s2);j++)
{
    if(s1[i]==s2[j])
    {
        isanag = 1;
        break;
    }

only requires that any character in the second string be present in the first string.

So since the letters of "mitt" are all within "mits", and length is the same, it reports they are anagrams. The reverse is not true.

Even if you repeated the check in the other direction, this would still not work.

So for example

mitttsson

missstton

would appear to be anagrams, since they are of the same length and both are generated by the set {m,i,t,s,o,n}.

You must check not only that the characters are equal, but also how many times they occur in the strings.

This is an (inefficient, for it calculates repeatedly repeated letters) variation:

for (i = 0; i < strlen(s1); i++)
{
    int c = 0;
    // How many times does character s1[i] occur in s1?
    for (j = 0; j < strlen(s1); j++)
    {
        if (s1[i] = s1[j])
        {
            // Improvement: if j < i, it means we already checked this letter.
            // so we might break here...
            c++;
        }
    }
    // improvement: if c is 0 here, it means we can 'continue' for this letter
    // has been already checked before.

    // Subtract how many times it occurs in s2.
    for (j = 0; j < strlen(s2); j++)
    {
        if (s1[i] = s2[j])
            c--;
    }
    // If occurrences are the same we expect difference to be 0.
    if (c)
    {
        printf("Not anagrams\n");
        break;
    }
}

UPDATE: the best solution is to change the algorithm altogether, as per vmp's or (even better) Mario the Spoon's.

LSerni
  • 55,617
  • 10
  • 65
  • 107