1

I would like to create a dictionary containing about 10 000 word pairs in Java, but I don't know what data structure I should use. If I have a word in my dictionary, for example because, I would like to the program find it if I search for only bec. If I have a phrase like the end I would like to find it if I search for th or en.

I tried ArrayList, but search is pretty slow. I don't want to use classes that implement the Map interface because they can only store one value for one key, so I can't search as described above.

This answer list some data structures for dictionaries, but I don't think they are the best for me: Best data structure for implementing a dictionary?

Community
  • 1
  • 1
racz16
  • 198
  • 2
  • 8
  • 1
    You can store one value per key in a `Map`, but nothing prevents you from having that value as a `Set` or `List`, or even another `Map`... – Mena Dec 02 '15 at 10:59
  • Why dont you use a sql database? Then you could use the `LIKE` operator. – Murat Karagöz Dec 02 '15 at 11:00

3 Answers3

1

What you are searching for is a trie.

As the java framework does not seem to have an implementation of one, take a look at this thread for possible libraries and solutions:

  • explanations and basic java implementation in Robert Sedgewick's book "Algorithms"
  • explanations and basic java implementation on Patel's blog
  • explanations and basic java implementation on an oracle thread
  • java library "Concurrent Radix and Suffix Trees for Java" on GitHub
  • java library "Practical Algorithm to Retrieve Information Coded in Alphanumeric (PATRICIA)" on GitHub
  • a java library by brianfromoregon on GitHub
  • Community
    • 1
    • 1
    Binkan Salaryman
    • 3,008
    • 1
    • 17
    • 29
    • 1
      Could you add a summary of the link to the answer? Links have a tendency to die after a while. – Holloway Dec 02 '15 at 13:20
    • 1
      It's fine if I only store words. But if I store for example "the end", a trie will only list "the end" (and it's meaning) if I type "t", "th" or "the". But I want to list it if I type "e", "en" or "end". So I want to list phrases if any of it's words starts with the letters I typed to the search area. – racz16 Dec 02 '15 at 13:27
    • @racz16 I don't think there's a structure for optimized ``String#contains`` checks, but depending on the phrase length you could check for its substrings, e.g. "the end", "he end", "e end", " end", "end", "nd", "d", "". – Binkan Salaryman Dec 02 '15 at 13:41
    • Finally, I implemented a trie, and it works fine, it's pretty fast. – racz16 Dec 05 '15 at 21:48
    0

    You can use NavigableSet which allows you to do partial lookups.

    NavigableSet<String> words = new TreeSet<>();
    words.add("tee");
    words.add("the");
    words.add("there");
    words.add("tidy");
    
    String th = words.higher("th");
    System.out.println("th ... "+th);
    

    prints

    th ... the
    

    If you want multiple words you can do

    NavigableSet<String> words = new TreeSet<>();
    words.add("tee");
    words.add("the");
    words.add("their");
    words.add("there");
    words.add("tidy");
    
    String start = "th";
    for (String w : subSet(start, start + '\uffff')) {
        System.out.println(start + " ... " + w);
    }
    

    which prints

    th ... the
    th ... their
    th ... there
    

    You can use a separate map to look up phrases by word.

    Note: This would be between 1000x and 10000x faster than using an SQL database.

    Peter Lawrey
    • 525,659
    • 79
    • 751
    • 1,130
    • I don't understand perfectly what you said, can you explain it a bit more detailed? It only stores a word, not a word with it's meaning, and if I type "th" I want to list all words starts with "th", so not only "the", but also "there" should be listed. – racz16 Dec 02 '15 at 13:19
    • @racz16 I have updated my answer to include multiple results. – Peter Lawrey Dec 02 '15 at 14:47
    0
    1. Use simple array
    2. Sort the array
    3. Search with binary search

    This is the fastest solution if you fill the dictionary once and then do only searches.

    Words that begin with same letters would be stacked together next to each other.

    Additional tree indexes are only useful if the data is large enough.

    dimm
    • 1,792
    • 11
    • 15