0

I often deal with mappings which are injective. In programming terminology, this can be expressed as a dictionary where all values are unique as well as, of course, all keys.

Is there a memory-efficient data structure for injective mappings with all the time-complexity properties you expect from dictionaries?

For example:

d = {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

d.get(2) = 'b'  # this works with a normal dictionary
d.get('b', reverse=True) = 2  # but this is not possible

All the solutions in Two way/reverse map seem to require using or combining two sets of mappings, focusing on making it easier to perform operations on a two-way map. This is fine for small dictionaries that fit neatly in memory, but not good for large dictionaries.

The requirement is there should be no additional memory overhead storing the injective two-way map versus a regular dictionary storing only one-way mappings.

I understand dictionaries use a hash table, which use an associative array data type. By definition, associative arrays implement key -> value mappings with unique keys. Is it possible, theoretically or in practice, to produce a smart injective mapping which allows reverse lookup?

If it is not possible, I would appreciate an explanation why such a construct is difficult, or impossible, to implement with the same efficiency as dictionaries.

Update

Following the discussion with @rpy (see comments below), any information on how to set up a python dictionary-like object using a perfect reversible hash function would be useful. But, of course, a working implementation would be ideal (I have already tried perfection).

jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    Are the keys unique with respect to the values...? If there are keys 1, 2, 3, is there any possibility that the values could also be either of 1, 2 or 3? – cs95 Jan 29 '18 at 01:17
  • Yes, we can have {1: 1, 2: 2, 3: 3} or {1: 2, 3: 1, 2: 3}. – jpp Jan 29 '18 at 01:18
  • If your mapping is injective but not bijective, then the reverse mapping is not defined for some "possible" values. So you are actually dealing with bijections right? – Julien Jan 29 '18 at 01:22
  • Why is this question not a duplicate of the referenced question? What are you asking that is not answered there. You already have one answer here that is the same as at that question. Can you clarify why this is not a dupe? – Stephen Rauch Jan 29 '18 at 01:24
  • @StephenRauch The solution in https://stackoverflow.com/questions/1456373/two-way-reverse-map does not have a memory-efficient requirement. – jpp Jan 29 '18 at 01:25
  • But you are going to need two maps, so you are saying you are OK with the two maps, but the objects themselves need to be only stored once? – Stephen Rauch Jan 29 '18 at 01:27
  • @StephenRauch, The requirement is: there should be no additional memory overhead storing the normal dict vs injective two-way dict. If this is not possible, I would like to understand why. – jpp Jan 29 '18 at 01:29
  • You can reimplement your own bijective dictionary at the C level using pointers. But in python, you need to hash the keys + the values, and the corresponding mapped values will have to be replicated (since python does not support pointers). – cs95 Jan 29 '18 at 01:31
  • I see a 3rd party package here: https://bidict.readthedocs.io/en/latest/index.html But I'm not sure how "memory friendly" it would be. – cs95 Jan 29 '18 at 01:36
  • @COLDSPEED, understood. Is there any reference to an existing C implementation of this? Injective mappings seem a fundamental construct that might be widely used and applied. – jpp Jan 29 '18 at 01:37
  • @COLDSPEED, unfortunately `bidict` library just facilitates combining maps and syntax: "In short, to model a bidirectional map, you need two separate one-directional maps that are kept in sync as the bidirectional map changes. This is exactly what bidict does under the hood" – jpp Jan 29 '18 at 01:38
  • @StephenRauch, I'm not sure about etiquette. Would it be a good idea to offer a bounty on the question at https://stackoverflow.com/questions/1456373/two-way-reverse-map stating this requirement? – jpp Jan 29 '18 at 01:48
  • I don't know that I am an etiquette expert on this topic. But your question seems to differ in that the referenced question was using the same object space at both ends. This question is not. It may be better to offer a Bounty on your question. BUT, it is generally regarded, that questions whose answers are the same are duplicates. So I would strongly suggest you try clarifying exactly why the other question doesn't answer this and why you are looking for something different. I think it is notable that even with the confusion, there were no down votes or votes to close on this question. GL. – Stephen Rauch Jan 29 '18 at 02:24
  • Nope, your best option is to use two dictionaries, that's the best way to get O(1) lookups in *both directions*. This has already been exhaustively covered, see the duplicate. – Martijn Pieters Feb 02 '18 at 18:34
  • @MartijnPieters, would you mind explaining the theory why? Below there is a discussion on perfect reversible hash maps. Is the reason: nobody has bothered implementing it as it is highly specialised? Or that it's not theoretically possible? – jpp Feb 02 '18 at 19:07
  • @jp_data_analysis: rpy's proposed solution is *just a double hash table*, same as two dictionaries. People have implemented these, and so far they all either use two separate dictionaries (forward and reverse mappings) or use the same dictionary to store both the forward and reverse relationship (making it harder to distinguish between the keys and values). That other question has been around for 8+ years; it is safe to assume that at this point in time noone has come up with a better design. – Martijn Pieters Feb 02 '18 at 19:13

1 Answers1

1

The net answer to your question is: NO (for any efficient implementation)

You put up two requirements that can not be fulfilled at the same time:

  1. Do not use extra memory for the reverse mapping
  2. Do not add extra time for doing (reverse) look-ups

Why are those two restrictions prohibiting a solution?

Mappings are pairs of values (tuples). The most trivial implementation would be:

Sequential searching all tuples for a match.

This would have identical complexity for forward and backward mapping.
However, this clearly violates the expectation of time-complexity properties you expect from dictionaries:

If you would allow for O(n) complexity, then searching a tuple set sequentially would give you a proper solution.

Usually dictionary implementations try to get down to O(1) or at least O(n*log(n)) complexity. This is being achieved by introducing additional data for speeding up look-ups, like hashes or trees. Unfortunately, such aids only help for one direction as they either deal with keys (forward mapping case) or values (reverse mapping case).

So, as soon as you need to keep look-up complexity down (this also applies for modification complexity, but usually dictionaries are tailored towards fast look-up), you will need to add data for achieving speed.

The whole issue turns down to the classic memory vs. speed trade-off.

EDIT:

An approach for addressing the problem in a general implementation (for cases where keys and values allow for getting a numeric representant if those are not integral numbers in the first place) might be:

Calculate a hash value for key and one for value and register the tuple under both hash values. This way you can take key or value and identify the matching tuple and return the proper result. This would even work for non injective cases when you allow for returning sets of matching tuples.

This will require more space (double the hash entries) while keeping look-up complexity within values typical for hash based dicts. You might need to keep an eye on hash bucket size (length of collision chains) especially when value sets of keys and values are not disjoint)

rpy
  • 3,953
  • 2
  • 20
  • 31
  • Thanks. This has pointed me to many other resources. It seems like you might need a reversible perfect hash function. What if the "injective dictionary" used [this reversible hash function](https://stackoverflow.com/a/13018842/9209546)? Given a key, it can use it to get to value. Given value, it can use it to get to key. Isn't that what we'd need? – jpp Feb 01 '18 at 14:51
  • Such approach will work in cases where you have an a-priori known fixed set of pairs (reversible mappings). For any case of dynamic mappings (supporting a put(key,value)) using a perfect hash might turn out to be not feasible performance wise (due to need of recalculating perfect hash after every put). As current perfect hash functions are designed to map a set of values to an interval [0..n], while in your case the key and/or value set might not be "dense" in some sense, you might need two perfect hash functions (one for keys-to-hash and one for value-to-hash) that are both are reversible. – rpy Feb 01 '18 at 15:19
  • As it turns out, the mappings I have are known in advance and fixed; that's also why I know they are injective. I don't think this is an unusual occurrence either. For example, mapping between various identifiers for a specific person (social security id, driving license, personal mobile number). Is there no obvious reason why this doesn't seem to be implemented in high-level languages? – jpp Feb 01 '18 at 15:43
  • Such a solution is extremely specific to the use case and the actual set of values to be mapped. Any implementation would work exactly for the given (fixed) set of pairs. Language support usually addresses general problems (mapping key to value). – rpy Feb 01 '18 at 15:47
  • I missed a remark: Using perfect reversible hash function is not a data structure approach. It actually is an algorithmic approach the provides for calculating the value from a key and from a value to a key in one go. – rpy Feb 01 '18 at 16:18
  • *Map the hash value of both key and value to a hash bucket that is referencing any matching tuples*: that can't work. A hash value *is the hash bucket reference*. You can't have make the hash values for all possible pairings of objects point to the same hash bucket for just that pairing. If you meant to *combine* the hash values to produce a new hash to target the bucket, then you can no longer look up the pair with just one or the other object. – Martijn Pieters Feb 02 '18 at 19:17
  • @Martijin: Not correct: hashing is a function from the source domain to the numeric domain in a way that the target domain is an interval from natural numbers _[0:n]_ mostly with _n_ being a power of _2_. Given such a function for keys and maybe a different one for values of a mapping, you are in a position to identify a related hash bucket. There you then need to search for the __pair__ and check whether the searched for key (or value) is there and returning the "other" item. – rpy Feb 03 '18 at 07:29
  • So, for clarification: I did not intend to say combine both key and value to form a single hash value but hash key and hash value and register the tuple under both values effectively doubling the number of entries. – rpy Feb 03 '18 at 07:29