2

I am looking for an efficient algorithm to do the following.

Given a table/dictionary of letter counts

counts = {'a': 3, 'b': 2, 'd': 2, 'e': 2, 'k':2, 'r': 2 }

and a list of M words

words = ['break, 'add', 'build' ... ]

write a function that finds N words using all the letters (and words from the list)

>>> find_words(counts, words, 3)
['break', 'break', 'add']

The naive approach of going over all the words N times is too slow (I think it's O(m^n)).

Mike D
  • 365
  • 3
  • 10
  • You used `break` twice but appears only once in the word list. Is it ok? Also, please give that complete word list of words. – nice_dev Oct 30 '18 at 20:08
  • yes it can contain the same word multiple times. The list of words doesn't matter it can be anything – Mike D Oct 30 '18 at 20:26
  • ok, is it fine if the count of any individual character is more than what is there in `counts`? For example- let's say `break` was `brrreeaak`. As you can see, `r` is `2` in `counts` but is `3` in `brrreeaak`. – nice_dev Oct 30 '18 at 20:43
  • no the total of all the letters of the N output words must sum to the `counts`. `brrreeaak` would be invalid as this word itself has 3 Rs while you need the sum of the Rs in all 3 words to be 2. – Mike D Oct 30 '18 at 20:47

1 Answers1

1

This problem has a strongly NP-complete feeling to it. I therefore would expect there to be no truly efficient solution.

I would therefore suggest that you try reducing it to a known problem and solve that instead. In particular you can convert this to an integer linear programming problem. Solvers for those are often surprisingly good. And per Python Mixed Integer Linear Programming there are a number that are available to you relatively easily.

The specific problem that you are trying to solve is this. You are trying to find a vector of m integers x_i such that 0 ≤ x_i, sum(x_i) ≤ n, and for each letter l, the sum of how many times the letter has been used does not exceed your count. The objective function that you would like to maximize is the sum of x_i * (1 + len(w_i)) where w_i is the i'th word.

This will always produce an answer. The answer that you get will represent a solution to your problem if and only if the objective function is n plus the count of letters available. If the objective function is less than that, then no solution exists.

btilly
  • 43,296
  • 3
  • 59
  • 88