This is an old question, so probably you've found some heuristics you already like. I came across this question while exploring ways to generate perfect pangrams, which will be the fewest number of characters (since they are only allowed to use each letter in the alphabet once). Anyway, for future finders like myself:
I wrote a program which has some success. I treated this problem more like graph search than set cover and used A* as a starting point for the algorithm. You can explore the code on github.
The things that helped the most were:
Compress the State Space
I took a dictionary and transformed all the words into their sorted letter set. For example, this way "BAD" and "DAB" are both stored as "ABD". The compressed dictionary I used took ~250,000 words down to ~31,000 unique letter combos which is a massive win.
Heuristics
As mentioned other places, this is NP hard so I started using heuristics. The three I'm currently using are:
Vowel Ratio
When I examine the letters remaining after picking a word, I compute #vowels / #unusedLetters. The motivation for this is pretty simple - having more vowels remaining makes it more likely that I'll be able select words using those letters.
Letter Commonality
When I read in the initial word set, I create a dictionary for each letter in the alphabet and count the number of times each letter appears across all the words. I used this dictionary to prefer nodes where the remaining letters had more common letters. (I believe OP mentioned this one in one of the comments)
Shared 3-Letter Combos
This is similar to the letter commonality heuristic. Again, when processing the initial word set, I created a dictionary which contains all 3-letter combinations which can be made with that word. So for example the letter-set ABC has only one valid combo, but ABCD has [ABC, ABD, BCD]. Remember, I only care about sorted letter-sets after having compressed the initial wordset.
So in the end, must like the letter commonality measure, I have a dictionary mapping all 26 choose 3 possible letter sets mapped to the number of times those combos appear across my wordset. Then I use this to prefer searching nodes where the remaining letters have more valid 3-letter combos.