5

I have a word list and a file containing a number of anagrams. These anagrams are words found in the word list. I need to develop an algorithm to find the matching words and produce them in an output file. The code I have developed so far has only worked for the first two words. In addition, I can't get the code to play nice with strings containing numbers anywhere in it. Please tell me how I can fix the code.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main (void)
{
int x = 0, y = 0;
int a = 0, b = 0;
int emptyx, emptyy;
int match = 0;
ifstream f1, f2;
ofstream f3;
string line, line1[1500], line2[50];
size_t found;

f1.open ("wordlist.txt");
f2.open ("file.txt");
f3.open ("output.txt");

while (f1.eof() == 0)
{
    getline (f1, line);
    line1[x] = line;
    x++;
}

while (f2.eof() == 0)
{
    getline (f2, line);
    line2[y] = line;
    y++;
}

//finds position of last elements
emptyx = x-1;
emptyy = y-1;

//matching algorithm
for (y = 0; y <= emptyy; y++)
{
    for (x = 0; x <= emptyx; x++)
    {
        if (line2[y].length() == line1[x].length())
        {
            for (a = 0; a < line1[x].length(); a++)
            {
                found = line2[y].find(line1[x][a]);
                if (found != string::npos)
                {
                    match++;
                    line2[y].replace(found, 1, 1, '.');

                    if (match == line1[x].length())
                    {
                        f3 << line1[x] << ", ";
                        match = 0;
                    }
                }
            }
        }
    }
}

f1.close();
f2.close();
f3.close();

return 0;
}
Abhay
  • 7,092
  • 3
  • 36
  • 50
  • Similar question, excellent answer through LINQ [Algoithm for Grouping anagram words](http://stackoverflow.com/questions/396005/algorithm-for-grouping-anagram-words?rq=1) – Vbp Mar 11 '14 at 22:27

2 Answers2

6

Step 1: Build an index with a key of the sorted characters in each word in the wordlist and with the value being the the word.

act   -  cat
act   -  act
dgo   -  dog

...

aeeilnppp - pineapple

....

etc...

Step 2: For each anagram you want to find, sort the characters in your anagram word, and then match against the index to retrieve all words from index with matching sorted key.

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • 1
    +1, this is the way to go. Just explain it a little better. maybe a (pseudo)code-sample? – Jan Jun 21 '11 at 08:57
  • 1
    Could you please elaborate? Do I create a new list with the sorted words for each of the the word list and the anagram file, and then find matches? How would that work when the string contains numbers? I'm having a hard time trying to manipulate them. –  Jun 21 '11 at 09:06
  • @geft: that;s the first mention of numbers! I thought you were talking about words? – Mitch Wheat Jun 21 '11 at 09:07
  • Yeah, most of the words are plain words, but some of them contain numbers like dog1 or 1cat. Sorry I wasn't being clear. Well, I can just remove them manually from the lists, but that would be cheating I suppose. –  Jun 21 '11 at 09:10
  • Thanks for the help! I just saw your edited answer. Initially I considered using overly complicated methods such as assigning prime numbers to each word, but I suppose yours is the best. –  Jun 21 '11 at 09:13
3

Trying to improve Mitch Wheat's solution:

  • Storing both sorted order and the word is really not necessary - store only the sorted string for every word in list.

  • Anyways, when we read a word from the file we have to sort it to find if it is equal to the sorted string - and the index is indexed on sorted string, so it will not help anyways.

  1. Build a 'position independent' hash with words in word list - also store the sorted string in the hash.

  2. For every word in file, get the 'position independent' hash and check in hashtable.

  3. If hit, sort and compare to every sorted string stored at this position in hash (collisions!).

Thoughts?

Jason Plank
  • 2,336
  • 5
  • 31
  • 40
excalibur
  • 31
  • 3