12

You are given a dictionary of 3 letter words and have to find a matrix of 3x3 such that each row, column and diagonal forms a word in the dictionary. The words in the dictionary are sorted and you can assume O(1) time for retrieving a word from the dictionary.

This has been asked as a Facebook interview question.

Michael J. Barber
  • 24,518
  • 9
  • 68
  • 88
Sachin
  • 3,672
  • 9
  • 55
  • 96
  • 1
    without the diagonal, it sounds similar to the [crossword](http://en.wikipedia.org/wiki/Crossword) problem, which is [NP-Complete](en.wikipedia.org/wiki/NP-complete) – amit Oct 01 '11 at 17:26
  • Can a word be used more than once? – hamstergene Oct 01 '11 at 17:26
  • A word has to be used only once I don't know how but I think that this can be solved using dynamic programming and/or backtracking. – Sachin Oct 01 '11 at 17:47
  • 1
    @amit: For fixed matrix size and fixed alphabet (like in this case) the number of all possible solutions is constant, so an O(1) algorithm can compute all possible matrices and then test each one against a dictionary for O(1) time :) – Ilia K. Oct 01 '11 at 17:53
  • It can probably be solved with backtracking by checking all possibilities, but run time will be exponentional. If this problem is indeed NP-Hard like its sister without the diagonal, it is probably your best shot. – amit Oct 01 '11 at 17:53
  • So the best way is to just keep building words and whenever a word is not found in the dictionary I just back track as till the error is not resolved? Is there no chance of a better algo? – Sachin Oct 01 '11 at 17:55
  • Depending on the size, you could pre-compute the entire table of valid combinations and store it. This might be feasable if you come up with a compact representation. I think this is @llia 's point. You could use tries on your word list to speed up the pre-calculations. – Merlyn Morgan-Graham Oct 01 '11 at 18:03
  • In this [answer to the question **Scrabble tile checking**](http://stackoverflow.com/questions/4854478/scrabble-tile-checking/4855134#4855134), a prefix tree was a key part of the algorithm. – Christian Ammer Oct 01 '11 at 18:20

4 Answers4

6

My approach would be to filter the dictionary first to create two new dictionaries: The first contains all single letter prefixes of words (of which there are probably 26), and the second contains all double letter prefixes of words (of which there are fewer than 26^2 since no word starts with BB, for example).

  1. Choose a word from the dictionary, call it X. This will be the first row of the matrix.

  2. Check that X1, X2, X3 are all valid single letter prefixes using that handy list you made. If they are, go on to step 3; otherwise go back to step 1.

  3. Choose a word from the dictionary, call it Y. This will be the second row of the matrix.

  4. Check that X1 Y1, X2 Y2, X3 Y3 are all valid double letter prefixes using that handy list you made. If they are, go on to step 5; otherwise go back to step 3. If this was the last word in the dictionary, then go all the way back to step 1.

  5. Choose a word from the dictionary, call it Z. This will be the third row of the matrix.

  6. Check that X1 Y1 Z1, X2 Y2 Z2, X3 Y3 Z3 are all words in the dictionary. If they are, congrats, you've done it! Otherwise go back to step 5. If this was the last word in the dictionary, then go all the way back to step 3.

I coded this up in Maple and it works reasonably well. I left it running to find all such matrices and it turns out there are enough to crash Maple due to memory overflow.

PengOne
  • 48,188
  • 17
  • 130
  • 149
  • But what about the case of both the diagonals being valid words? I can get that you are check only for columns. Actually we could check the diagonals in the end, but if we construct the matrix row wise then we can check for the matrix as we are adding more rows. But then it would **complicate** the checking logic a bit – Sachin Oct 08 '11 at 19:25
  • 1
    check X1 Y2 as well and check that Y2 X3 is a suffix at step 4; check diagonals at step 6 – PengOne Oct 08 '11 at 19:32
1

Your comment suggest you are also looking for backtracking solution, which will be not efficient, but solves this. Pseudo-Code:

solve(dictionary,matrix):
  if matrix is full:
       if validate(dictionary,matrix) == true:
            return true
        else:
            return false
  for each word in dictionary:
      dictionary -= word
      matrix.add(word)
      if solve(dictionary,matrix) == true:
          return true
      else:
          dictionary += word
           matrix.removeLast()
   return false //no solution for this matrix.

In the above pseudo-code, matrix.add() adds the given word in the first non-occupied line. matrix.remove() removes the last occupied line, and validate() checks if the solution is a legal one.

Activation: solve(dictionary,empty_matrix), if the algorithm yields true, there is a solution and the input matrix will contain it, else it will yield false.

The above pseudo-code runs in exponential time! it is very non-efficient. However, since this problem resembles(*) the crossword problem, which is NP-Complete, it might be your best shot.

(*)The original Crossword-Problem doesn't have the diagonal condition that this problem has, and is of course more general: nxm matrix, not only 3x3. Though the problems are similar, a reduction doesn't pop to my head, and I will be glad to see one if exist.

amit
  • 175,853
  • 27
  • 231
  • 333
  • It would be great if you could cite a proof that crossword problem is NPC (which includes the exact definition of the problem, of course) – Ilia K. Oct 01 '11 at 18:34
  • 1
    @IliaK.: this [article](http://people.cs.uct.ac.za/~rbkmax001/project.pdf) on section 3.3 discusses the crossword problem and how to prove it is NPC. the crossword problem is basically: given a matrix with black and white squares, and a dictionary, can a valid crossword be built in it? note that there is another proof, that doesn't use any black squares, making a complete 'white' board also NP-Complete – amit Oct 01 '11 at 19:28
  • At worst it's O(dictionary size)^3, which isn't that bad for a dictionary of 3 character words. But I think if you used pruning and did it character by character (start from middle, then go to corners for maximal constraints early), you could find a solution very fast. – Rob Neuhaus Oct 01 '11 at 23:05
1
  • You find every unique set of 3 words.
  • You get all 6 possible matrices for those 3 words.
  • You do a dictionary check for the 5 words that could be created from those matrices (3 columns and 2 diagonals).

Some JavaScript to illustrate.

//setup a test dictionary
var dict = [
 "MAD",
 "FAD",
 "CAT",
 "ADD",
 "DOG",
 "MOD",
 "FAM",
 "ADA",
 "DDD",
 "FDD"
];
for(var i=0; i<dict.length; i++)
 dict[dict[i]]=true;

// functions
function find(dict) {
for(var x=0; x<dict.length; x++) {
for(var y=x+1; y<dict.length; y++) {
for(var z=y+1; z<dict.length; z++) {
 var a=dict[x];
 var b=dict[y];
 var c=dict[z];
 if(valid(dict,a,b,c)) return [a,b,c];
 if(valid(dict,a,c,b)) return [a,c,b];
 if(valid(dict,b,a,c)) return [b,a,c];
 if(valid(dict,b,c,a)) return [b,c,a];
 if(valid(dict,c,a,b)) return [c,a,b];
 if(valid(dict,c,b,a)) return [c,b,a];
}
}
}
return null;
}
function valid(dict, row1, row2, row3) {
 var words = [];
 words.push(row1.charAt(0)+row2.charAt(0)+row3.charAt(0));
 words.push(row1.charAt(1)+row2.charAt(1)+row3.charAt(1));
 words.push(row1.charAt(2)+row2.charAt(2)+row3.charAt(2));
 words.push(row1.charAt(0)+row2.charAt(1)+row3.charAt(2));
 words.push(row3.charAt(0)+row2.charAt(1)+row1.charAt(2));
 for(var x=0; x<words.length; x++)
  if(dict[words[x]] == null) return false;
 return true;
}

//test
find(dict);
Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • I'd upvote this if I understood the pseudocode, the English is well said. – ninjagecko Oct 03 '11 at 19:23
  • Function "find" coincides with my first 2 bullet points, while function "valid" implements the 3rd bullet point. The pseudo code is working javascript, it can be debugged in any browser with a javascript console (ex: FireFox with FireBug). – Louis Ricci Oct 03 '11 at 19:33
1

I was not necessarily looking for a backtracking solution. It just struck to me that back tracking can be used, but the solution with that is a bit complicated. However we can use branch and bound and pruning to cut short the brute force technique.

Instead of searching for all possible combinations in the matrix, first we will select one string as the topmost row. Using the first character we find a suitable contender for the 1st column. Now using the 2 and 3 characters of the column string, we will find suitable words that can be fitted in the second and third row.

To efficiently find words beginning with a particular character we will use radix sort so that all words beginning with a particular character are stored in the same list. This when we have chosen the the second and third row of the matrix, we have a complete matrix.\

We will check whether the matrix is valid, by checking the 2nd and 3rd column and the diagonals form words that fall in the dictionary.

As and when we find that the matrix is valid we can stop. This helps in cutting down some of the possible combinations. However I feel this can be further optimized by considering another row or column, but then it would be a bit complicated. I am posting a working code below.

Please don't mind the naming of the functions, since I am an amateur coder, I generally do not give very appropriate names and some part of the code is hard coded for 3 letter words.

#include<iostream>
#include<string>
#include<algorithm>
#include<fstream>
#include<vector>
#include<list>
#include<set>

using namespace std;

// This will contain the list of the words read from the
// input file
list<string> words[26];

// This will contain the output matrix
string out[3];

// This function finds whether the string exits
// in the given dictionary, it searches based on the 
// first character of the string

bool findString(string in)
{
    list<string> strings = words[(int)(in[0]-'a')];
    list<string>:: iterator p;

    p = find(strings.begin(),strings.end(),in);
    if(p!=strings.end())
        return true;
}

// Since we have already chosen valid strings for all the rows
// and first column we just need to check the diagnol and the 
// 2 and 3rd column

bool checkMatrix()
{
    // Diagnol 1
    string d1;
    d1.push_back(out[0][0]);
    d1.push_back(out[1][1]);
    d1.push_back(out[2][2]);

    if(!(findString(d1)))
        return false;

    // Diagnol 2
    string d2;
    d2.push_back(out[0][0]);
    d2.push_back(out[1][1]);
    d2.push_back(out[2][2]);


    if(!(findString(d2)))
        return false;

    // Column 2
    string c2;
    c2.push_back(out[0][1]);
    c2.push_back(out[1][1]);
    c2.push_back(out[2][1]);

    if(!(findString(c2)))
        return false;

    // Column 3
    string c3;
    c3.push_back(out[0][2]);
    c3.push_back(out[1][2]);
    c3.push_back(out[2][2]);


    if(!(findString(c3)))
        return false;
    else
        return true;
    // If all match then return true
}

// It finds all the strings begining with a particular character

list<string> findAll(int i)
{
    // It will contain the possible strings
    list<string> possible;
    list<string>:: iterator it;

    it = words[i].begin();
    while(it!=words[i].end())
    {
        possible.push_back(*it);
        it++;
    }

    return possible;
}

// It is the function which is called on each string in the dictionary

bool findMatrix(string in)
{
    // contains the current set of strings
    set<string> current;

    // set the first row as the input string
    out[0]=in;
    current.insert(in);

    // find out the character for the column
    char first = out[0][0];

    // find possible strings for the column
    list<string> col1 = findAll((int)(first-'a'));
    list<string>::iterator it;

    for(it = col1.begin();it!=col1.end();it++)
    {
        // If this string is not in the current set
        if(current.find(*it) == current.end())
        {
            // Insert the string in the set of current strings
            current.insert(*it);

            // The characters for second and third rows
            char second = (*it)[1];
            char third = (*it)[2];

            // find the possible row contenders using the column string
            list<string> secondRow = findAll((int)(second-'a'));
            list<string> thirdRow = findAll((int)(third-'a'));

            // Iterators
            list<string>::iterator it1;
            list<string>::iterator it2;


            for(it1= secondRow.begin();it1!=secondRow.end();it1++)
            {
                // If this string is not in the current set
                if(current.find(*it1) == current.end())
                {

                    // Use it as the string for row 2 and insert in the current set
                    current.insert(*it1);

                    for(it2 = thirdRow.begin();it2!=thirdRow.end();it2++)
                    {
                        // If this string is not in the current set
                        if(current.find(*it2) == current.end())
                        {   

                            // Insert it in the current set and use it as Row 3
                            current.insert(*it2);

                            out[1]=*it1;
                            out[2]=*it2;

                            // Check ifthe matrix is a valid matrix
                            bool result = checkMatrix();

                            // if yes the return true
                            if(result == true)
                                return result;

                            // If not then remove the row 3 string from current set
                            current.erase(*it2);
                        }
                    }
                    // Remove the row 2 string from current set
                    current.erase(*it1);
                }
            }
            // Remove the row 1 string from current set
            current.erase(*it);
        }
    }
    // If we come out of these 3 loops then it means there was no 
    // possible match for this string
    return false;           
}

int main()
{
    const char* file = "input.txt";
    ifstream inFile(file);

    string word;

    // Read the words and store them in array of lists
    // Basically radix sort them based on their first character
    // so all the words with 'a' appear in the same list 
    // i.e. words[0]

    if(inFile.is_open())
    {
        while(inFile.good())
        {
            inFile >> word;
            if(word[0] >= 'a' && word[0] <= 'z')
            {
                int index1 = word[0]-'a';
                words[index1].push_back(word);
            }
        }
    }
    else
        cout<<"The file could not be opened"<<endl;


    // Call the findMatrix function for each string in the list and
    // stop when a true value is returned

    int i;
    for(i=0;i < 26;i++)
    {
        it = words[i].begin();
        while(it!=words[i].end())
        {
            if(findMatrix(*it))
            {
                // Output the matrix
                for(int j=0;j<3;j++)
                    cout<<out[j]<<endl;

                // break out of both the loops
                i=27;
                break;
            }
            it++;
        }
    }

    // If i ==26 then the loop ran the whole time and no break was
    // called which means no match was found

    if(i==26)
        cout<<"Matrix does not exist"<<endl;

    system("pause");
    return 0;
}

I have tested the code on a small set of strings and it worked fine.

Sachin
  • 3,672
  • 9
  • 55
  • 96