C++ Finding Anagrams in words

Question

I'm working on a program that looks at whether or not a particular word is an anagram using std:count however, I don't think my function logic is correct and I cannot seem to figure it out.

Assume there are the following words in the file:

Evil
Vile
Veil  
Live

My code is as follows:

#include <iostream>
#include <vector>
#include <fstream>
#include <map>
using namespace std;

struct Compare {
std::string str;
Compare(const std::string& str) : str(str) {}
};

bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
return c.str == p.second;
}
   bool operator==(const Compare& c, const std::pair<int, std::string>&p) {
   return c.str == p.second;
}

std::vector<std::string> readInput(ifstream& file)
{
std::vector<std::string> temp;

string word;

while (file >> word)
{
    temp.push_back(word);
}
std::sort(temp.begin(), temp.end());

return temp;
}

int main(int argc, char *argv[]) {  

string file = "testing.txt";
ifstream ss(file.c_str());

if(!ss.is_open())
{
    cerr << "Cannot open the text file";
}

std::vector<std::string> words = readInput(ss);

std::map<int, std::string> wordsMap; 

//std::map<std::string value, int key> values; 

for(unsigned i=0; (i < words.size()); i++)
{
    wordsMap[i] = words[i];
}


int count = std::count(wordsMap.begin(), wordsMap.end(), Compare("Evil"));
cout << count << endl;
}

I'm pretty sure it's just a case of my logic is wrong in the functions. I hope someone can help :)

You don't need a comparator class for `std::string`, it overloads `operator==` for this purpose. — jrok, Sep 16 '13 at 17:48
@jrok Thanks for the reply. But for me to determine the Anagram, I need to be able to access the element of str[i...n], right? — Phorce, Sep 16 '13 at 17:50
Can you please clarify a bit. Are you trying to figure out for each word in the file if it has another word in the same file that is an anagram of it? Can you provide what the expected output is? — masahji, Sep 16 '13 at 17:52
@masahji So the expected output here would be: 3 because each of the words inside the textfile are anagrams of the word "Evil" — Phorce, Sep 16 '13 at 17:55
@PHorce Thanks for the clarification. Your file actually had 4 lines in it (each an anagram of "Evil" which was passed as an argument to `Compare`). So wouldn't it output 4? — masahji, Sep 16 '13 at 18:12

score 2 · Accepted Answer · answered Sep 16 '13 at 18:13

2

The most simple approach would be

To check like following (pseudo code)

bool isAnagram(string s, string t) {return sort(s) == sort(t); }

So, use some think like following, no need of std::map

struct Compare {
std::string str;
Compare(const std::string& x) : str(x) { 
    std::sort(str.begin(),str.end()); std::transform(str.begin(), 
    str.end(),str.begin(), ::toupper);}

    bool operator ()(const std::string& t)
    {
        std::string s= t;
        std::transform(s.begin(), s.end(),s.begin(), ::toupper);
        std::sort(s.begin(),s.end());

    return s == str;
    }
};

And then

int count = std::count_if(words.begin(), words.end(), Compare("Evil"));

See HERE

answered Sep 16 '13 at 18:13

P0W

46,614
9
72
119

Thank you. But, does the Compare method have to sort each time? When working with large values (total number of words) this becomes really slow.. – Phorce Sep 16 '13 at 18:20
@Phorce yeah agreed ! – P0W Sep 16 '13 at 18:24
But, for example: the `int count = std::count_if` would therefore be in a for loop so like so: `for(unsigned i=0; (i < words.begin()); i++) { int count = std::count_if(words.begin(), words.end(), Compare(words[i]);` but something does not seem right – Phorce Sep 16 '13 at 18:25

Abhishek Bansal · Answer 2 · 2013-09-16T18:24:01.577

1

EDIT: It seems in your present code, you are checking whether the strings are exactly equal to each other (not anagrams).

INSTEAD:
For each word, make an array of 26 elements, each element corresponding to a letter of the alphabet. Parse each word character by character, and increase the count of the particular character in the respective array.

For example for evil, the array would be:

0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0. // It has 1's for letters e,v,i and l
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z

You make this array for each word that you have. In your case, all the words will have the same array. You then compare these arrays element-wise and proceed accordingly.

Now you just need to see which words have the same corresponding array.

If you want to compare all the N words pair-wise, you can do so using two nested loops in O(N^2) complexity.
The complexity for comparing one pair is O(1).
Complexity of creating the arrays = O(L) where L is the length of the string.

edited Sep 16 '13 at 18:24

answered Sep 16 '13 at 17:53

Abhishek Bansal

12,589
4
31
46

No, you need to change your comparing functions to check for the arrays of elements rather than the strings themselves. – Abhishek Bansal Sep 16 '13 at 18:00
Could you give an example please? I'm confused as to what exactly you mean by creating a 26 element array, I don't see how this would work using the code I supplied. – Phorce Sep 16 '13 at 18:01
I think the complexity is O(n) if we follow your technique and not O(n^2). The reason being that we traverse the input string once to fill the above array - this takes O(n) time. We then traverse the string to be tested for anagram once - this takes another O(n) time in the worst case. Array element accesses are O(1) hence the overall complexity for each test would be O(n) + O(n) + O(1) = O(n) – Prahalad Deshpande Sep 16 '13 at 18:19
Yes you are right. I said O(n^2) because I assumed that all the words shall be compared with each other. n = number of words (not length of string). Sorry I shall edit my answer. – Abhishek Bansal Sep 16 '13 at 18:21

score 1 · Answer 3 · answered Sep 16 '13 at 18:08

This is not the most efficient algorithm, but a quick change to your program that would work could be:

bool operator==(const std::pair<int, std::string>&p, const Compare& c) {
  std::string a = c.str;
  std::transform(a.begin(), a.end(), a.begin(), ::tolower);
  std::sort(a.begin(), a.end());

  std::string b = p.second;
  std::transform(b.begin(), b.end(), b.begin(), ::tolower);
  std::sort(b.begin(), b.end());

  return a == b;
}

score 0 · Answer 4 · answered Sep 17 '13 at 06:59

0

Consider the following:

map<string, set<string>> anagrams;

for (auto word : words)
    anagrams[sort(word)].insert(word);

const set<string>& find_anagrams(const string& word)
{
    return anagrams[word];
}

answered Sep 17 '13 at 06:59

Andrew Tomazos

66,139
40
186
319

score 0 · Answer 5 · edited May 23 '17 at 10:31

When you have a lot of words that are relatively short (or if you can work with large number libs), then you can use a solution similar to what I wrote here -

Generate same unique hash code for all anagrams

Essentially - map each character to a unique prime number (doesn't have to be big, you can map the entire ABC into primes up to 101), and for each word multiply the primes received from it characters. Since multiplication is commutative, anagrams would give the same result, so you just compare that result, hash it, or do whatever you want

Keep in mind that for long words the values would grow pretty fast, so you might need a big numbers lib

C++ Finding Anagrams in words

5 Answers5

Linked