4

I have a language dictionary (i.e. english,italian,etc...), that essentially is a file with one word on every line.

Now i want to create a class with a method that given a string in input check if that string exists into that dictionary.

My idea is that the method return a boolean value. In pseudocode:

boolean checkWord(String s){
    if(StringIsInDictionary) return true;
    return false
}

What should be the best way to implement that feature?

Consider that the file will contain ~65000 words.

reesjones
  • 704
  • 3
  • 9
  • 28
Ivan
  • 4,186
  • 5
  • 39
  • 72

4 Answers4

7

Read the dictionary into a Set<String> (for example, HashSet<String>), and then use set.contains(word).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • And consider using a `HashSet` constructor which takes an `initialCapacity` parameter. http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html#HashSet(int) – Matt Ball Mar 02 '13 at 15:34
  • Is it efficient also for smartphones? – Ivan Mar 02 '13 at 15:37
2

For a space and time efficent solution (like you might use on a smartphone), consider a bloom filter. Then you won't need to store the dictionary on the phone, and checking that a string is in a dictionary will be very fast. Note that a bloom filter may return a false positive, but you can tune it to reduce that risk.

There are several open-source Java implementations of bloom filters out there. One is here https://github.com/magnuss/java-bloomfilter.

lreeder
  • 12,047
  • 2
  • 56
  • 65
  • +1, A Bloom filter is optimal for situations where memory and performance are constrained. – Joni Mar 02 '13 at 17:25
1

You will probably not want to store the words as one word per line. A better approach might be to read the file from disk only once, store the words in a HashSet (a set backed by a HashMap, which is very efficient to search), and then use set.contains("mystring"). This will, however, require the whole map to be in memory, but it will be very efficient when you need to check multiple words.

You could then even go back and serialize the set in a more efficient manner to disk, making the initial loading faster.

Polygnome
  • 7,639
  • 2
  • 37
  • 57
1

Take a look at this question, I think it could help you. Fastest way to find a string in a text file with java

Community
  • 1
  • 1
maxivis
  • 1,727
  • 3
  • 21
  • 35