0

I am trying to create a program that looks through all the permutations of a string and then prints out all the valid words. I can get all the permutations, but checking if a word is in a dictionary text files takes around 3 seconds. When I tried 7 letters it took 47:19. Is there any way to read from a file faster?
Any help would be appreciated.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>




bool in(std::string arr, char element)
{
    for (int i = 0; i < arr.size(); i++)
    {
        if (arr[i] == element)
        {
            return true;
        }
    }

    return false;
}


bool inDictionary(std::string str)
{
    std::ifstream dictionary;
    dictionary.open("words.txt");
    if (dictionary.is_open())
    {
        std::string word;
        while (std::getline(dictionary, word))
        {
            if (word == str)
            {
                return true;
            }
        }
    }
    return false;
}




std::string remainingCharacters(std::string orginal, std::string newString)
{
    std::string characters = "";
    for (int i = 0; i < orginal.size(); i++)
    {
        if (!in(newString, orginal[i]))
        {
            characters += orginal[i];
        }
    }
    





    return characters;
}


void combinations(std::string cur, std::string original, std::vector<std::string>& permutations)
{

    for (int i = 0; i < remainingCharacters(original, cur).size(); i++)
    {
        
        permutations.push_back(cur + remainingCharacters(original, cur)[i]);
        combinations(cur + remainingCharacters(original, cur)[i], original, permutations);
    }
}

int main()
{

    std::vector<std::string> permutations;
    combinations("", "wdsrock",  permutations);
    for (int i = 0; i < permutations.size(); i++)
    {
        if (inDictionary(permutations[i]))
        {
            std::cout << permutations[i] << ", ";
        }
    }
;
}
   
sourab maity
  • 1,025
  • 2
  • 8
  • 16
Jaden
  • 55
  • 6
  • 5
    If the dictionary isn't too big, you can read it all into memory, i.e., cache it, ahead of time prior to your `inDictionary` lookups. – Turix Dec 28 '20 at 04:57
  • 5
    If you want to find 3 words, you need to call `inDictionary` 3 times, each time opening a file, and starting from the beginning of the file? What if you had a list of 1000 words to search for? Simply stated, your current approach is highly flawed. Wouldn't it make more sense to read the file once, store the information somewhere, and search that particular storage, whether it is an array, hash table, etc? – PaulMcKenzie Dec 28 '20 at 04:59
  • 3
    A better question than making your slow step faster might be asking how you can invoke the slow step less often. Is this (loop through the permutations; in each iteration loop through the file) the only approach you can think of? I'll offer a challenge: try to accomplish your goal with only one pass through your file and no need to call `combinations()`. – JaMiT Dec 28 '20 at 05:10
  • @JaMiT Thanks, I stored the text file into a unordered map at the beginning of the program, and used std::unordered_map::find() to quickly iterate through the file. it takes under a second now. one last questions, is it best to use a unordered map, or should I use an array? – Jaden Dec 28 '20 at 05:50
  • Depends on what you're doing and how much data you have. Imagine you have a thousand items in your dictionary and have to `if (arr[i] == element)` all of them all of the time. Eeeeyuck. `unsorted_map` uses a smarter algorithm for looking things up, it turns your input into a number and uses that number as an array index. Sometimes it has to do a bit more work, too many inputs map to the same number and the algorithm has to figure out which of many items at the array index is the correct one, but usually the main cost is converting the input into the index and it only happens once. – user4581301 Dec 28 '20 at 06:00
  • What makes you think that the reading is the bottle neck? From your description it seems to me that the writing will take most of the time. – Yunnosch Dec 28 '20 at 07:30

2 Answers2

3

There are ways to reduce number of read operations, and number of permutations tested:

  • Store dictionary in memory
  • associate word letters to all possible permutation (anagrammer).
  • iterate over combination instead of permutation

so

std::unordered_map<std::string, std::vector<std::string>> read_dictionary()
{
    std::ifstream dictionary;
    dictionary.open("words.txt");
    if (!dictionary.is_open()) { throw std::runtime_error("No dictionary"); }
    std::unordered_map<std::string, std::vector<std::string>> res;
    std::string word;
    while (std::getline(dictionary, word))
    {
        auto anagram = word;
        std::sort(anagram.begin(), anagram.end());

        res[anagram].push_back(word);
    }
    return res;
}

template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
    /* Credits: Thomas Draper */
    if ((first == last) || (first == k) || (last == k))
        return false;
     Iterator itr1 = first;
     Iterator itr2 = last;
     ++itr1;
     if (last == itr1)
         return false;
     itr1 = last;
     --itr1;
     itr1 = k;
     --itr2;
     while (first != itr1)
     {
         if (*--itr1 < *itr2)
         {
             Iterator j = k;
             while (!(*itr1 < *j)) ++j;
             std::iter_swap(itr1,j);
             ++itr1;
             ++j;
             itr2 = k;
             std::rotate(itr1,j,last);
             while (last != j)
             {
                 ++j;
                 ++itr2;
             }
             std::rotate(k,itr2,last);
             return true;
        }
    }
    std::rotate(first,k,last);
    return false;
}

int main()
{
    const auto dictionary = read_dictionary();
    std::string letters = "dsrock";
    std::sort(letters.begin(), letters.end());

    for (std::size_t i = 1; i != letters.length() + 1; ++i) {
        do {
            auto it = dictionary.find(letters.substr(0, i));
            if (it != dictionary.end()) {
                for (const auto& word : it->second) {
                    std::cout << word << std::endl;
                }
            }
        } while (next_combination(letters.begin(), letters.begin() + i, letters.end()));
    }
}

Demo

Jarod42
  • 203,559
  • 14
  • 181
  • 302
1
  1. Best way is to map file to memory and read it . Boost library provides API to read into memory mapped files
#include <iostream>
#include <boost/iostreams/device/mapped_file.hpp>
using namespace std;
int main()
{
boost::iostreams::mapped_file_params arg;
arg.path = "yourfile.txt";
arg.new_file_size = pow(1024, 2); // 1 MB
boost::iostreams::mapped_file::mapmode::readonly;
boost::iostreams::mapped_file mf;
mf.open(arg);
char* bytes = (char*) mf.const_data();
cout << bytes << endl;
mf.close();
return 0;
}
  1. Another way if u don't want memory mapped files is to read blocks of data . Reading blocks of data is faster than reading character individually
ifstream is(filename,readmode);
if (is) {
char* buffer = new char[length+1];
is.read(buffer, length);
buffer[length] = '\0';
is.close();
user265906
  • 120
  • 1
  • 8
  • Both are options if you have a LOT of data in those files and the files are well-organized and easy to search. Otherwise. Slow. slow. slow. Notes: Avoid using `pow` when operating on integers. `pow(1024, 2)` heads into floating point space and, due to floating point imprecision, may not convert back to the integer you expect. Should replace `char* buffer = new char[length+1];` with a `std::vector` Significantly fewer memory management woes. – user4581301 Dec 28 '20 at 06:08