1

I'm looking for a static data structure with amortised constant time associative lookup. The only operations I want to perform are lookup and construction. Also being functional is a must. I've had a look at finger trees, but I can't seem to wrap my head round them. Are there any good docs on them or, better yet, a simpler static functional data structure?

Nomad010
  • 263
  • 5
  • 8

2 Answers2

1

I assume that by "functional" and "static" you mean an immutable structure, which can not be modified after its construction, by "lookup" you mean a dictionary-like, key-value lookup and by "construction" you mean the initial construction of the datastructure from a given set of elements.

In that case an immutable, hashtable based dictionary would work. The disadvantage with that is that insertions and removals are O(N), but you state that this is acceptable in your case.

Depending on what programming language you are using a datatype that could be used to implement this might or might not be available. In Erlang a tuple could be used. Haskell has an immutable array in Data.Array.IArray.

Lii
  • 11,553
  • 8
  • 64
  • 88
  • Yes, you are correct, I want an immutable data structure with dictionary-like lookup. Languages aside, I can't see how Data.Array.IArray will give amortized constant lookup though. – Nomad010 Feb 24 '15 at 09:32
  • A hashtable doesn't have constant time lookup -- your hashes need to grow with the number of elements you index, otherwise you're complexity will be dominated by collision clearance. Thus, the complexity of your hash grows with `N`, not to mention the way you find the matching hash in your hashtable. – Marcus Müller Feb 24 '15 at 09:49
  • @MarcusMüller: [The Wikipedia article on the subject](http://en.wikipedia.org/wiki/Hash_table) as well as [other sources](http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1/2771398#2771398) disagree with you. Maybe this depends on what model that is used to describe the data, e.g. are keys and the numbers that represent the hashes and table indices assumed to be constant in size as _N_ grows? There seem to be broad consensus however that in the standard model of this hashtable lookups are _O(1)_ on average. – Lii Feb 24 '15 at 11:10
  • @user2383374: An `Data.Array.IArray` would (if I'm not terribly mistaken) give you an array with constant time index lookups. This could be used to implement a hashtable based dictionary which would preform as well as a normal hash table. – Lii Feb 24 '15 at 11:18
  • Ahh, thanks a lot @Lii. Although I can't seem to find any implementation details. The target language is C++. – Nomad010 Feb 24 '15 at 11:39
  • @user2383374: If the target language is C++, maybe you could take an existing dictionary datastructure and just wrap it in a functionally interface? – Lii Feb 24 '15 at 11:50
  • @Lii: Unfortunately, I cannot do that easily in this case, but I will take that into consideration. Ideally, I'd like to implement something from scratch to tailor my problem, instead of just wrapping other ones. I like the finger tree, but boy could it use some simplification. – Nomad010 Feb 24 '15 at 11:57
  • 1
    Take a look at Perfect hashing. From the book Introduction to Algorithms (Cormen et al.): "Although hashing is often a good choice for its excellent average-case-performance, hashing can also provide excellent worst-case performance when the set of keys is static: once the keys are stored in the table, the set of keys never changes." – Mazvél Feb 24 '15 at 12:02
  • @user2383374: In this answer I also assume that by "construction" you mean the initial construction of the datastructure from a given set of elements. – Lii Feb 24 '15 at 12:23
  • @user2383374: The whole point with finger trees is (at least judging from my limited knowledge) that it is purely functional but also offers efficient updates and their complexity stems from this property. If you don't need to do insertions after the initial construction it might be overkill to use them. – Lii Feb 24 '15 at 12:29
  • @Lii: correct, all elements being unique strings say. Yeah, on the second point, but I can't seem to find a simpler variant. – Nomad010 Feb 24 '15 at 12:30
  • So I already have a BST structure that works. I think I'll just use that and ignore the log factor, or investigate tries as a functional data structure. – Nomad010 Feb 24 '15 at 12:35
0

You'd have to look at this from an information-theoretical point of view: The more key/value pairs you store, the more keys you'd have to recognize on an associative lookup. Thus, no matter what you do, the more keys you have, the more complex your lookup will be.

Constant time lookup is only possible when your key directly gives you the address (or something equivalent) of the element you want to look up.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • I know it's possible from an information-theoretical point of view. For instance, tries provide arbitrary associative lookup in constant-time, assuming you can ignore key length. Finger trees provide the exact bound I'm looking for I think but I was hoping for something simpler. – Nomad010 Feb 24 '15 at 10:06
  • *assuming you can ignore key length* is what drives me nuts -- how *can* you possibly ignore key length, when considering complexity? It's like saying "my car doesn't use any gasoline, assuming you tow it". – Marcus Müller Feb 24 '15 at 10:10
  • Um, maybe ignore is not the right word here. Suppose you have a scenario where your key lengths are bounded by a constant. In most systems, that is a reasonable assumption. Sure, if the keys have unbounded size then you are likely in trouble regardless of whether or not you use a trie. – Nomad010 Feb 24 '15 at 10:23
  • right. But then you have let's say 32bit hashes, and a hash table with let's say 1 Million entries. How do you find the matching hash in constant time? Sure, you can set an upper bound (the hash table could be represented as a binary tree with a depth of 32 max), but setting an upper bound for a bounded problem and proclaiming that to be constant is kind of um... arbitrarily possible for any complexity, be it as horrible as it might? – Marcus Müller Feb 24 '15 at 10:28
  • 1
    Whatever method you chose, accessing those memory addresses isn't going to be constant time. Even so, it's an assumption we make whenever we do complexity analysis. You don't work with the actual memory-access-time function because you're not insane. Yes, you could replace your bounded table with a bounded binary tree and it would still be constant time. You could replace the bounded table with a bounded linked list and it would still be constant time. It doesn't say anything about constant time solutions being equally fast. If the table size wasn't bounded, then things would be different. – Nomad010 Feb 24 '15 at 10:49
  • 1
    When performing algorithm time complexity analysis one works with a model of computation which is constructed to be theoretically convenient but also capture important aspects of the practical problems which are interesting in this context. One common simplification is that keys and numbers are constant in size and operations on them are performed in constant time. This is not the case for arbitrary large numbers that are represented on a computer but on the other hand more accurately describe numbers in the ranges one is usually interested in. – Lii Feb 24 '15 at 11:45
  • The statements that are made in this answer seem to fail to take the abovementioned kinds of simplifications into consideration. – Lii Feb 24 '15 at 11:46