1

I have to store ~50,000 English words in memory and I'd like to know what would be the best data structure in term of memory footprint (and loading speed). Would it be a Trie? How would I serialize it into a file? Is there anything better than that?

Essentially, once the ~50,000 words are loaded into memory, I simply need to check if the word exists or not.

Martin
  • 39,309
  • 62
  • 192
  • 278

4 Answers4

1

Well, according to your provided guidelines, a simple List would be better.

Fetching time would be obviously slower than a Trie or Dictionary, but

"in term of memory footprint (and loading speed)"

It will require very little memory overhead, and will load faster (as no indexes / prefix data structures are built).

See this blog post for some memory comparison details (In JavaScript, but still applies).

seldary
  • 6,186
  • 4
  • 40
  • 55
0

According to this answer, the Dictionary class is what you need. As per the MSDN documentation, you should use the TryGetValue method to access your data:

Use the TryGetValue method if your code frequently attempts to access keys that are not in the dictionary. Using this method is more efficient than catching the KeyNotFoundException thrown by the Item property.

Community
  • 1
  • 1
npinti
  • 51,780
  • 5
  • 72
  • 96
0

A Dictionary object is proposed. Read these:

Most efficient in-memory data structure for read-only dictionary access

Why is Dictionary preferred over hashtable?

For Help on implementation read this:

http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

For serializing the dictionary object or the hash table read this reference:

http://blogs.msdn.com/b/adam/archive/2010/09/10/how-to-serialize-a-dictionary-or-hashtable-in-c.aspx

Community
  • 1
  • 1
mohsensajjadi
  • 325
  • 1
  • 6
  • 18
  • Sure, if the choice is »Dictionary or Hashtable«, then Dictionary. But their question even includes the hint that they might want a Trie. – Joey Apr 30 '12 at 06:19
  • I chose a bad reference, but my intention was to show info on the Dictionary class. A better reference would be http://stackoverflow.com/questions/8570201/most-efficient-in-memory-data-structure-for-read-only-dictionary-access – mohsensajjadi Apr 30 '12 at 06:45
0

Yes, a trie sounds ok for this. For serialising you'd have two options:

  1. Use the original word list and rebuild the trie. It should be fast enough, I guess, but you may want to profile it.
  2. Just use normal .NET serialisation for the type and dump it to a file. This prevents programs in other languages from reading it, though.
Joey
  • 344,408
  • 85
  • 689
  • 683