3

My main idea is to find an algorithm ( Java ) that takes the random letters which someone has typed in a JoptionPane for instance and then instantly by pressing "Find words" i would like the program to derive all those words that match my letters from a dictionary stored in a .txt file.

I am struggling to find that algorithm.

For example:

Take into consideration that we got the following letters in a Scrabble match:

a , o ,p, t, e, z, e, w

I would like to find a java code or at least an algorithm in order to find from an english dictionary .txt file all the words that have those letters but not anything else. if I type " a, p, p" I want to have as a result the word "app" and not (app"s"). So ... to sum up, how can i compare those letters from words stored in a .txt file and as a result get specific words that match in my given letters ?

Kara
  • 6,115
  • 16
  • 50
  • 57
Ane
  • 29
  • 6
  • 2
    ... What have you tried so far? Any code available? – Tassos Bassoukos Feb 20 '14 at 18:16
  • 1
    please show some code to show your development in making this algorithm. Or atleast your thought process in how to accomplish it. – Shrey Feb 20 '14 at 18:19
  • I feel that this could be done better in some other language. – Epiglottal Axolotl Feb 20 '14 at 18:20
  • Prolog comes in mind :) Here's a quick approach: generate hash for every permutation of the input alphabet, that's `len([A,L,E,P,P])!` in your case - and assuming you have a prebuilt hash table for every word in `.txt` file, you then just have to do 120 O(1) lookups to it. – nullpotent Feb 20 '14 at 18:23
  • possible duplicate of [Anagram Algorithm using a hashtable and/or tries](http://stackoverflow.com/questions/19600442/anagram-algorithm-using-a-hashtable-and-or-tries) – hatchet - done with SOverflow Feb 20 '14 at 18:28
  • 1
    This isn't an exact anagram. Remember, it should match leap as well. – McLovin Feb 20 '14 at 18:32
  • 1
    @user3295442 - that does make the question not an exact duplicate of the link above, but a solution for this variant is similar. There will be an additional step of repeating for each permutation of sorted letter sets in the search string. Since only unique sorted letter sets are considered, that will be a lot less than all letter arrangements. – hatchet - done with SOverflow Feb 20 '14 at 18:37
  • @hatchet user3295442 is not the OP. – Marko Topolnik Feb 20 '14 at 18:39

2 Answers2

3

There are different ways to do this, depending on how efficient you want it.

An easy but less efficient way is, to take the string and go through the whole dictionary-file, checking each line if the requirement is met: checking for each character of the input whether it is present in the dict-file-line (make a temp-copy of it and remove chars from it so that each available letter can only be used once).

A harder but efficient way is, to pre-process the dictionary-file into a Trie (Prefix-Tree) [wikipedia]. You can then use all permutations of the input-string as a roadmap through the Trie.

EDIT: note as Marko Topolnik points out that computing all permutations of the input-string would be expensive - so to avoid that: at every step you only check which letters are still available from the input-string AND of those you only keep which are available as next-branches in the Trie.

Bernd Elkemann
  • 23,242
  • 4
  • 37
  • 66
  • 1
    But the permutation count explodes with string length. That doesn't at all seem like a good recourse. Sorting the chars in the string, which will eliminate the excess degree of freedom, seems the best way. – Marko Topolnik Feb 20 '14 at 18:35
  • @MarkoTopolnik you dont need to compute the permutations: at every step you only check which letters are still available. **AND** for each of those you only keep those that are available as next-branches in the Trie. – Bernd Elkemann Feb 20 '14 at 18:36
  • But you still have a messy path through the trie, with backtracking. Searching for a sorted string is clearly superior, but it needs a custom Trie, which holds all the actual entries that sort to the string at each position. – Marko Topolnik Feb 20 '14 at 18:38
  • @MarkoTopolnik true, it will backtrack, but I think that can not be avoided. yes, sorting comes to mind but the problem is, that not all letters of the input need to be used. – Bernd Elkemann Feb 20 '14 at 18:41
  • That's a minor concern: you'll just have more matches, but still no backtracking. Basically, each node you visit is a match. – Marko Topolnik Feb 20 '14 at 18:43
  • @MarkoTopolnik we have to backtrack if the input contained letters that we cannot use: e.g. if the input is EXRA and the dict contains AXE: the sorted input is AERX and the dict will contain AEX. we will find 'A' (first match), the sorted-Trie will tell us there's a path AE but no match, then we tell it to go AER and AERX (again it will tell that those could not be found); we should backtrack to discard the R and allow to find AEX. – Bernd Elkemann Feb 20 '14 at 18:52
  • I was just putting up an answer using a trie and sorted letter lists but realized that you still have to work through all combinations of letters (and lengths of "words"). This would be more efficient than having to evaluate all permutations of letters but the added complexity doesn't buy very much unless the words are fairly long – NealB Feb 20 '14 at 19:03
  • Actually, there's no backtracking in what you describe: you just find there's no AER branch and move on. But it's still true there would be backtracking if the trie contained both the AER branch and the AEX branch: then you'd have to visit both and not just take the one branch which matches at each node. – Marko Topolnik Feb 20 '14 at 19:34
  • @MarkoTopolnik I think we can assume that such AER branches exist (and they can contain large sub-trees) – Bernd Elkemann Feb 20 '14 at 19:36
1

This can done in following manner:-

1.First check the exact word is in dictionary or not.If it exist then you can store them in array or list as you want and display it.for ex:-
By typing "app" in a JOptionPane it will show apple or apps and more related words.
2.If it is wrong means not matching to any word in dictionary then apply edit distance.

Devavrata
  • 1,785
  • 17
  • 30
  • How does looking up the exact word find you words starting with / containing those letters and/or related words? Or was "By typing 'app' ..." supposed to fall under / after the second point? You want to check the edit distance to each other word? That would be incredibly expensive and, since letter order doesn't matter and you can't insert letters, that would be way overcomplicating it. – Bernhard Barker Feb 21 '14 at 02:16
  • I gave only solution which I know!!! – Devavrata Feb 21 '14 at 08:32
  • How I am able to do the "check"? Do you have an algorithm in your mind? In java? My thought is when i type "a p p l e" to show me the words: app , apple , leap and not the word "apps" only if i give the extra letter "s". – Ane Feb 23 '14 at 17:13