1

Is there an efficient implementation of a hash table, which maps key (integer) to values (string) and vice versa, for some compiled language?

Of course, one could always have two tables, one for key=>value mapping and other for value=>key. However that would not be very efficient, at least not memory-wise. Possibly both mappings can be in a single table, if the type system and intended usage allow it.

kyrill
  • 986
  • 10
  • 27
  • It is not that hard to implement. Just create entries with payload+(2 sets of {head,next} pointers), plus some machinery to handle it. – joop Mar 28 '17 at 09:59
  • Right, that seems more memory-efficient that using two hash tables. On the other hand, pointers can cause a headache when implementing reallocation of the table. – kyrill Mar 28 '17 at 10:25
  • Fixed size vs variable size can be a design-choice. OTOH resizing/doubling is not that hard (if you don't use stored pointers it is relatively easy) – joop Mar 28 '17 at 10:31
  • But in your approach you will, no? – kyrill Mar 28 '17 at 10:31
  • Use {head, next} pointers. – kyrill Mar 28 '17 at 10:34
  • 1
    No, in cases like this, I prefer to store indexes for {head,next}), because these are stable after resize/realloc() – joop Mar 28 '17 at 10:36

2 Answers2

3

One name for this is a BiMap (as in bidirectional). The obvious limitation is that keys will be distinct (like in a normal dictionary/map), but so will with values.

For Java, there's a StackOverflow question on it, but the general recommendation is the Guava BiMap.

For C and C++, Boost has a Bimap.

Internally, it's the "inefficient" implementation you mention where it keeps two hashtables. Here's the thing: it is efficient, and using twice as much memory for a secondary lookup structure is expected, and rarely a big deal.

Community
  • 1
  • 1
David Ehrmann
  • 7,366
  • 2
  • 31
  • 40
  • How do you know it internally keeps two hashtables? And what do you mean by 'it'? For [Java](https://google.github.io/guava/releases/21.0/api/docs/com/google/common/collect/BiMap.html#inverse--), "The two bimaps are backed by the same data", from which I would presume there is no duplicity. [Boost.Bimap doc](http://www.boost.org/doc/libs/1_54_0/libs/bimap/doc/html/boost_bimap/bimap_and_boost.html#boost_bimap.bimap_and_boost.bimap_and_multiindex) says "Boost.MultiIndex is, in fact, the core of the bimap container", from which it seems to me that there is only one set of data with two indices. – kyrill Mar 27 '17 at 10:16
  • 1
    For Guava, here's their [HashBiMap implementation](https://github.com/google/guava/blob/72da1ce1/guava/src/com/google/common/collect/HashBiMap.java#L107-L108). Note the two internal hashtables. What's meant by "same data" is that the references stored in those two maps are the same, so the *data* isn't actually being duplicated, just the reference. – David Ehrmann Mar 27 '17 at 15:25
0

This is the datastructureI usefor a bihash: The overhead is four ints (for the indexes) per entry.

In the example below, with typedef unsigned char Index; , the overhead will be four characters, and the maximal table capacity will be 255.


        /* For demonstration purposes these types are VERY SMALL.
        ** Normally one would use unsigned short, or unsigned int.
        */
typedef unsigned char Index;
typedef unsigned short Hashval;
        /* Use the maximal representable value as sentinel value */
#define NIL ((Index)-1)

struct entry {
        Index head_key          /* The head-of-the-chain pointers */
                , head_val;     /* ... and for the values */
        Index next_key          /* linked list for chaining the keys */
                , next_val;     /* ... and for the values */
        };

struct table {
        unsigned totlen, keylen;
                /* free points to the root of the freetree */
        Index size, free;
                /* The complete payload, for both keys and values.
                 * layout = [key0|val0|key1|val1|..] (without padding/alignment)
                 */
        char *data;
        struct entry *entries; /* All the entries. Not pointers, but the actual entries. */
        };
        /* Macros for accessing the pool of payload */
#define NODE_KEY(p,n) ((p)->data + (n) * (p)->totlen)
#define NODE_VAL(p,n) ((p)->data + (n) * ((p)->totlen+(p)->keylen))

#define TH_OK 0
#define TH_KEY_NOT_FOUND 1
#define TH_VAL_NOT_FOUND 2
#define TH_BOTH_NOT_FOUND 3
#define TH_TABLE_FULL 4
#define TH_KEY_DUPLICATE 5
#define TH_VAL_DUPLICATE 6
#define TH_BOTH_DUPLICATE 7
#define TH_TOTAL_ECLIPSE 8

/********************************************/

    /* Allocate and initialise the hash table.
    ** Note: given fixed size, the table and the payload could be statically allocated,
    ** (but we'd still need to do the initialisation)
    */


struct table * table_new( unsigned keylen, unsigned vallen, unsigned totcount )
{
Index idx;
struct table *this;

if (totcount > NIL) {
        fprintf(stderr, "Table_new(%zu,%zu,%zu): totcount(%zu) larger than largest Index(%zu)\n"
                , (size_t) keylen, (size_t) vallen, (size_t) totcount
                , (size_t) totcount, ((size_t)NIL) -1 );
        return NULL;
        }
this = malloc (sizeof *this);
this->size = totcount;
this->keylen = keylen;
this->totlen = keylen+vallen;
this->data = malloc (totcount * this->totlen );
this->entries = malloc (totcount * sizeof *this->entries );

this->free = 0; /* start of freelist */
for( idx=0; idx < this->size; idx++ ) {
        this->entries[idx].head_key = NIL;
        this->entries[idx].head_val = NIL;
        this->entries[idx].next_key = NIL;
        this->entries[idx].next_val = idx+1; /* unused next pointer reused as freelist */
        };
this-> entries[idx-1].next_val = NIL; /* end of freelist */

fprintf(stderr, "Table_new(%zu,%zu,%zu) size = %zu+%zu+%zu\n"
                , (size_t) keylen, (size_t) vallen, (size_t) totcount
        , sizeof *this, (size_t)totcount * this->totlen, totcount * sizeof *this->entries
         );

return this;
}

wildplasser
  • 43,142
  • 8
  • 66
  • 109