Is it a good idea to store words of a dictionary with 100.000 words in a static array of string. I'm working on spellchecker and I thought that way would be faster.
6 Answers
Definitely its not a good idea to store so many strings as an array especially if you are using it for spell check which means you will have to search for and compare strings. It would make it inefficient to search or compare a string through the array as it would always be a linear search

- 10,073
- 4
- 31
- 45
-
1Almost anyone would alphabetize and binary search on an array. – sje397 Jul 27 '10 at 11:41
-
1+1: Big O is no always relevant, but when n = 100k, O(n) equals "slow as hell". – Jul 27 '10 at 11:41
-
100k isn't all that large an N; the problem isn't a linear search through 100k entries, but that each entry is going to require a string comparison. +1 to using a HashSet, at this scale. Consider using either a Trie, a B+Tree, a ISAM-Tree, or a String-BTree once you start looking at an order or two of magnitude larger than this. – Recurse Jul 28 '10 at 02:17
You should generally prefer a Java Collections Framework class to a native Java array for anything non-trivial. In this particular case, what you have is a Set<String>
(since no words should appear more than once in the dictionary).
A HashSet<String>
offers constant time performance for the basic operations add
, remove
, and contains
, and should work very well with String
hashcode formula.
For larger dictionaries, you'd want to use more sophisticated data structures specialized for storing a set of strings (e.g. a trie), but for 100K words, a HashSet
should suffice.
See also
- Java Tutorials/Collections Framework
- Effective Java 2nd Edition, Item 25: Prefer lists to arrays

- 376,812
- 128
- 561
- 623
-
3+1 for mentioning Trie, since it's an efficient data structure to use within the mentioned problem field, i.e. spell checking. – Giuseppe Accaputo Jul 27 '10 at 11:59
-
For trie implementation in Java: http://stackoverflow.com/questions/623892/where-do-i-find-a-standard-trie-based-map-implementation-in-java – polygenelubricants Jul 27 '10 at 12:07
How about an approach with in memory database technology like for example sqlite inmemory This allows you to use efficient querying without disk overhead

- 734
- 1
- 4
- 21
I think 100 000 is not so large amount that search wolud be inefficent. Of course it depends ... It would work nice if you are checking if a word exists in array - it's a linear complexity algorithm. You can keep table ordered so you can use quicksort search algoritm and make it more efficent.
On the other hand - if you wold like to find, 5 most likely words (using N-gram method or something) you should consider using Lucene or other text database.

- 12,080
- 13
- 60
- 91
Perhaps using an SQLite database would be more efficient ? I think that's what firefox/thunderbird does for spell checking but I'm not entirely sure.

- 3,095
- 7
- 27
- 31
You won't be able to store that amount of strings in a static variable. Java has a size limit for static code and even method bodies. Simply use a flatfile and read it upon class instanciation - Java is faster than most people think with those things.
See Enum exeeding the 65535 bytes limit of static initializer... what's best to do?.

- 1
- 1

- 3,190
- 1
- 33
- 47
-
1dear downvoter: although I agree that this is not a very elegant answer, I do think that a downvote to a serious (non-troll) answer deserves a comment. – Sean Patrick Floyd Jul 27 '10 at 12:35