Which datastructure is appropriate for this situation?

Question

I'm trying to decide which datastructure to use to store key-value pairs when only features needed are

insertion
lookup

Specifically, I don't need to be able to delete pairs, or iterate through keys/values/pairs.

The keys are integer tuples, the values are pointers (references, whatever). I'm only storing a couple million pairs spread out over (many) objects.

Currently I'm considering using either

a hash table
a kd-tree
a b-tree

I'm leaning toward the hash table (for the O(1) insertion/lookup time), but I wanted to confirm my leanings.

Which structure (of those above or other) would you recommend and why? If you would recommend a hash table, should I create a separate table for each object, or just create a single table and use the object's id as part of the key tuple?

A kd-tree is a spatial data structure - it doesn't make any sense to use it in this situation. Did you mean a red-black tree? — Greg Rogers, Aug 20 '09 at 14:28
How are you going to be using this data structure? Are you doing all the inserts up front, or intermingling them with the lookups? How many lookups do you expect vs insertions? Are the keys unique? How many keys are there (2^64?). How are they distributed? — Dolphin, Aug 21 '09 at 03:31

Andrew Hare · Accepted Answer · 2009-08-20T14:05:14.480

4

A hashtable will be the best choice here as all the operations that matter to you are O(1) (and as such you shouldn't need to worry about creating multiple hashtables).

edited Aug 20 '09 at 14:05

answered Aug 20 '09 at 14:00

Andrew Hare

344,730
71
640
635

O(1) vs O(log n) be aside -- I've (anecdotally) read that hash-maps only perform better above some N as the constant is high enough to discourage its use for some cases. A couple million pairs sounds high enough for me, but do you some numbers? Or is this so much implementation dependent that only profiling will help anyways? – gimpf Aug 20 '09 at 14:03

score 1 · Answer 2 · answered Aug 20 '09 at 14:02

I'm a big fan of hash tables, since they are easy and there are implementations available for pretty much every major language out there. The O(1) insertion/lookup is an especially good feature.

You should probably use a single table, to save on memory. Hash tables are notoriously inefficient memory-wise, and using a single table would help to minimize that.

score 1 · Answer 3 · answered Aug 20 '09 at 14:05

1

Hash tables would be usefull here and I see no reason to have more than one table.

answered Aug 20 '09 at 14:05

Jambobond

619
3
12
23

1

this post has a background of hash tables http://stackoverflow.com/questions/371136/binary-trees-vs-linked-lists-vs-hash-tables – Jambobond Aug 20 '09 at 14:12

score 0 · Answer 4 · answered Aug 20 '09 at 14:05

0

Most trees have an O(n ln n) lookup time, but hashtables have an O(1) lookup time, so that's the one you want to use. It's also very common, and often the implementation is highly-optimised to boot.

answered Aug 20 '09 at 14:05

thecoop

45,220
19
132
189

Which datastructure is appropriate for this situation?

4 Answers4