Finding every possible word out of a bigger word

Question

Hi I'm looking for an algorithm to extract every possible word out of a single word in C++.
For example from the word "overflow" I can get these : "love","flow","for","row,"over"...
So how can I get only valid english words efficiently.
Note: I have a dictionary, a big word list.

Is there a specific part of the algorithm that you're having trouble with, or do you just want someone to do the entirety of the work for you? — dwanderson, Mar 14 '15 at 22:11
Anagramme algo : http://stackoverflow.com/q/13692221/4660239 — Florian Prud'homme, Mar 14 '15 at 22:12
I just don't know how to do it , I have thought about finding all possible combinations and checking them but it will take a lot of time for words bigger than 5 characters . — docfed, Mar 14 '15 at 22:16
I basically want to do what this website does [here](http://www.thewordfinder.com/anagram-solver/) — docfed, Mar 14 '15 at 22:19
If you want an algorithm that's faster than the one you're thinking of, it can probably be made by modifying the one you're thinking of. — Drew Dormann, Mar 14 '15 at 22:24
If you're looking for a general algorithm - edit the C++ tag out. You'll get more responses and less people mad at the fact you didn't include any code. Then - however - you should provide information about the algorithms you've already found. If you're looking for code - provide us with what you've already done yourself. — Paweł Stawarz, Mar 14 '15 at 22:36

Galik · Accepted Answer · 2015-03-15T16:07:56.937

I can't think how to do this without bruit-forcing it with all the permutations.

Something like this:

#include <string>
#include <algorithm>

int main()
{
    using size_type = std::string::size_type;

    std::string word = "overflow";

    // examine every permutation of the letters contained in word
    while(std::next_permutation(word.begin(), word.end()))
    {
        // examine each substring permutation
        for(size_type s = 0; s < word.size(); ++s)
        {
            std::string sub = word.substr(0, s);

            // look up sub in a dictionary here...
        }
    }

    return 0;
}

I can think of 2 ways to speed this up.

1) Keep a check on substrings of a given permutation already tried to avoid unnecessary dictionary lookups (std::set or std::unordered_set maybe).

2) Cache popular results, keeping the most frequently requested words (std::map or std::unordered_map perhaps).

NOTE: It turns out even after adding cashing at various levels this is indeed a very slow algorithm for larger words.

However this uses a much faster algorithm:

#include <set>
#include <string>
#include <cstring>
#include <fstream>
#include <iostream>
#include <algorithm>

#define con(m) std::cout << m << '\n'

std::string& lower(std::string& s)
{
    std::transform(s.begin(), s.end(), s.begin(), tolower);
    return s;
}

std::string& trim(std::string& s)
{
    static const char* t = " \t\n\r";
    s.erase(s.find_last_not_of(t) + 1);
    s.erase(0, s.find_first_not_of(t));
    return s;
}

void usage()
{
    con("usage: anagram [-p] -d <word-file> -w <word>");
    con("    -p             - (optional) find only perfect anagrams.");
    con("    -d <word-file> - (required) A file containing a list of possible words.");
    con("    -w <word>      - (required) The word to find anagrams of in the <word-file>.");
}

int main(int argc, char* argv[])
{
    std::string word;
    std::string wordfile;
    bool perfect_anagram = false;

    for(int i = 1; i < argc; ++i)
    {
        if(!strcmp(argv[i], "-p"))
            perfect_anagram = true;
        else if(!strcmp(argv[i], "-d"))
        {
            if(!(++i < argc))
            {
                usage();
                return 1;
            }
            wordfile = argv[i];
        }
        else if(!strcmp(argv[i], "-w"))
        {
            if(!(++i < argc))
            {
                usage();
                return 1;
            }
            word = argv[i];
        }
    }

    if(wordfile.empty() || word.empty())
    {
        usage();
        return 1;
    }

    std::ifstream ifs(wordfile);

    if(!ifs)
    {
        con("ERROR: opening dictionary: " << wordfile);
        return 1;
    }

    // for analyzing the relevant characters and their
    // relative abundance

    std::string sorted_word = lower(word);
    std::sort(sorted_word.begin(), sorted_word.end());

    std::string unique_word = sorted_word;
    unique_word.erase(std::unique(unique_word.begin(), unique_word.end()), unique_word.end());

    // This is where the successful words will go
    // using a set to ensure uniqueness
    std::set<std::string> found;

    // plow through the dictionary
    // (storing it in memory would increase performance)
    std::string line;
    while(std::getline(ifs, line))
    {
        // quick rejects

        if(trim(line).size() < 2)
            continue;

        if(perfect_anagram && line.size() != word.size())
            continue;

        if(line.size() > word.size())
            continue;

        // This may be needed if dictionary file contains
        // upper-case words you want to match against
        // such as acronyms and proper nouns
        // lower(line);

        // for analyzing the relevant characters and their
        // relative abundance

        std::string sorted_line = line;
        std::sort(sorted_line.begin(), sorted_line.end());

        std::string unique_line = sorted_line;
        unique_line.erase(std::unique(unique_line.begin(), unique_line.end()), unique_line.end());

        // closer rejects

        if(unique_line.find_first_not_of(unique_word) != std::string::npos)
            continue;

        if(perfect_anagram && sorted_word != sorted_line)
            continue;

        // final check if candidate line from the dictionary
        // contains only the letters (in the right quantity)
        // needed to be an anagram

        bool match = true;
        for(auto c: unique_line)
        {
            auto n1 = std::count(sorted_word.begin(), sorted_word.end(), c);
            auto n2 = std::count(sorted_line.begin(), sorted_line.end(), c);

            if(n1 < n2)
            {
                match = false;
                break;
            }
        }

        if(!match)
            continue;

        // we found a good one
        found.insert(std::move(line));
    }

    con("Found: " << found.size() << " word" << (found.size() == 1?"":"s"));
    for(auto&& word: found)
        con(word);
}

Explanation:

This algorithm works by concentrating on known good patterns (dictionary words) rather than the vast number of bad patterns generated by the permutation solution.

So it trundles through the dictionary looking for words to match the search term. It successively discounts the words based on tests that increase in accuracy as the more obvious words are discounted.

The crux logic used is to search each surviving dictionary word to ensure it contains every letter from the search term. This is achieved by finding a string that contains exactly one of each of the letters from the search term and the dictionary word. It uses std::unique to produce that string. If it survives this test then it goes on to check that the number of each letter in the dictionary word is reflected in the search term. This uses std::count().

A perfect_anagram is detected only if all the letters match in the dictionary word and the search term. Otherwise it is sufficient that the search term contains at least enough of the correct letters.

Change the first input to "stackoverflow" and you have an unreasonable execution time for your algorithm. — Captain Giraffe, Mar 14 '15 at 22:42
this is similar to what i am doing but it takes so much time it's unusable. I think the problem is in my dictionary lookup algorithm. — docfed, Mar 14 '15 at 23:13
@user2507237 You are right this is very slow even after adding caching. But I was hooked on this problem and wrote a much faster solution here: http://ideone.com/JSKFQP — Galik, Mar 15 '15 at 12:53
Well That's a fast algorithm I just tried it , can you post another answer with an explanation of it (sorry I'm really just a beginner), if not just post it here so it might be useful to others. and thanks that was exactly what I wanted. — docfed, Mar 15 '15 at 13:31
@user2507237 Unfortunately the question is closed as off-topic for SO so I can't post another answer. But glad its useful to you. — Galik, Mar 15 '15 at 14:28

Finding every possible word out of a bigger word

1 Answers1