Is boost::bimap overkill for injective functions?

Question

Let T_1 and T_2 be two types and f: Dom(T_1) -> Dom(T_2) be an injective function which is not a bijection; and for the sake of discussion suppose I get a representation of f as disparate pairs rather than code for computing it. Now, I need to be able to apply both f and f^{-1} relatively quickly, so I was thinking of a map in each direction. Then it occurred to me that I might want a data structure for these two maps together - as I have multiple such f's.

I naturally thought "Hmm, I'm sure Boost must have something like that", and indeed, Boost has a Bimap structure. The thing is, that one is intended for general binary relations; also, it has to account for the possibility of repeated insertions without re-optimizing the structure each time, while in my case I only insert once, then do many lookups. So, I feel that's bimap might be a bit overkill for me, and unoptimized for my use case. Is that true?

Notes:

I'm interested in time complexity (and actual time) at the expense of space.
Same question for non-injective f (with f^{-1} being a non-function relation).

You can tell bimap (via template parameters) what inner types to use. If you don't use a multiset structure, you won't be able to have duplicates. So you can encode the constraint you want into the type system. — Silvio Mayolo, Apr 27 '18 at 00:40
@SilvioMayolo: I don't understand what you mean by "inner types". Can you elaborate? — einpoklum, Apr 27 '18 at 00:41
@DavisHerring: It means the inverse relation, which is not a function (thus, for example, you could use a multimap for it). — einpoklum, Apr 27 '18 at 07:24
@rustyx: Large enough so that what you're thinking of won't work :-) — einpoklum, Apr 27 '18 at 13:52

score 0 · Answer 1 · answered Apr 27 '18 at 20:49

Normal C++ container behavior (reinforced by your hint about object size) is to store each key and value (which concepts are distinct only in the non-injective case) only once. The separate insert and lookup phases suggest contiguous storage (for cache locality if nothing else). The “time at the expense of space” says you want a hash table.

So store an array of pair<T1,T2> and a pair of low-load-factor, open-addressed hash tables mapping the keys and values (respectively) to indices in the array. Each of these is nothing but an array of indices (or pointers computed after the (full) array is allocated) and has decent cache performance.

Non-injective case

Sort (or at least group) the pairs by the (non-unique) T2 value, and store in the hash table (for each such unique value) the index of the beginning of the run. Then all the corresponding T1 objects can be found together (stopping at the first differing T2 or at the end of the array).

If there are many T2 objects that are == and they are interchangeable, you can instead store separate arrays of pair<T1,index> and pair<T2,index>, each storing indices into the other as above. Each entry in both hash tables then stores indices into the T1 array, since whichever key is always needed to check for equality after the hash lookup and the T2 object in a pair is immediately and unambiguously available from the index stored alongside the T1.

Is boost::bimap overkill for injective functions?

1 Answers1

Non-injective case