Huge Static Array of String

Question

Is it a good idea to store words of a dictionary with 100.000 words in a static array of string. I'm working on spellchecker and I thought that way would be faster.

score 5 · Answer 1 · answered Jul 27 '10 at 11:37

5

Definitely its not a good idea to store so many strings as an array especially if you are using it for spell check which means you will have to search for and compare strings. It would make it inefficient to search or compare a string through the array as it would always be a linear search

answered Jul 27 '10 at 11:37

Gopi

10,073
4
31
45

1

Almost anyone would alphabetize and binary search on an array. – sje397 Jul 27 '10 at 11:41
1

+1: Big O is no always relevant, but when n = 100k, O(n) equals "slow as hell". – Jul 27 '10 at 11:41
100k isn't all that large an N; the problem isn't a linear search through 100k entries, but that each entry is going to require a string comparison. +1 to using a HashSet, at this scale. Consider using either a Trie, a B+Tree, a ISAM-Tree, or a String-BTree once you start looking at an order or two of magnitude larger than this. – Recurse Jul 28 '10 at 02:17

score 5 · Accepted Answer · answered Jul 27 '10 at 11:47

5

You should generally prefer a Java Collections Framework class to a native Java array for anything non-trivial. In this particular case, what you have is a Set<String> (since no words should appear more than once in the dictionary).

A HashSet<String> offers constant time performance for the basic operations add, remove, and contains, and should work very well with String hashcode formula.

For larger dictionaries, you'd want to use more sophisticated data structures specialized for storing a set of strings (e.g. a trie), but for 100K words, a HashSet should suffice.

3

+1 for mentioning Trie, since it's an efficient data structure to use within the mentioned problem field, i.e. spell checking. – Giuseppe Accaputo Jul 27 '10 at 11:59
For trie implementation in Java: http://stackoverflow.com/questions/623892/where-do-i-find-a-standard-trie-based-map-implementation-in-java – polygenelubricants Jul 27 '10 at 12:07

score 1 · Answer 3 · answered Jul 27 '10 at 11:51

1

How about an approach with in memory database technology like for example sqlite inmemory This allows you to use efficient querying without disk overhead

answered Jul 27 '10 at 11:51

dwergkees

734
1
4
21

score 0 · Answer 4 · answered Jul 27 '10 at 11:45

I think 100 000 is not so large amount that search wolud be inefficent. Of course it depends ... It would work nice if you are checking if a word exists in array - it's a linear complexity algorithm. You can keep table ordered so you can use quicksort search algoritm and make it more efficent.

On the other hand - if you wold like to find, 5 most likely words (using N-gram method or something) you should consider using Lucene or other text database.

score 0 · Answer 5 · answered Jul 27 '10 at 11:50

0

Perhaps using an SQLite database would be more efficient ? I think that's what firefox/thunderbird does for spell checking but I'm not entirely sure.

answered Jul 27 '10 at 11:50

Not Available

3,095
7
27
31

score 0 · Answer 6 · edited May 23 '17 at 12:01

0

You won't be able to store that amount of strings in a static variable. Java has a size limit for static code and even method bodies. Simply use a flatfile and read it upon class instanciation - Java is faster than most people think with those things.

See Enum exeeding the 65535 bytes limit of static initializer... what's best to do?.

edited May 23 '17 at 12:01

Community

1
1

answered Jul 27 '10 at 11:50

Daniel Bleisteiner

3,190
1
33
47

1

dear downvoter: although I agree that this is not a very elegant answer, I do think that a downvote to a serious (non-troll) answer deserves a comment. – Sean Patrick Floyd Jul 27 '10 at 12:35

Huge Static Array of String

6 Answers6

See also