0

Given three identifiers, combine them into a single 32-bit value.

It is known, that the first identifier may have (2^8)-1 different values. Analogically, the second (2^8)-1 and the third (2^10)-1. Therefore the total count of identifiers of all kinds will not exceed (2^32)-1.

Example solution could be to have a map:

  • key: 32 bits,
  • value: 8 (or 10) bits.

The value would begin at 0 and be incremented every time a new identifier is provided.

Can it be done better? (instead of 3 maps) Do you see a problem with this solution?


To clarify, the identifier can hold ANY values from the range <0, 2^32). The only information that is given, is that the total number of them will not exceed (2^8)-1 (or 10th).

The identifiers can have the same values (it's completely random). Consider the randomness source memory addresses given by the OS to heap-allocated memory (e.g. using a pointer as an identifier). I realize this might work differently on x64 systems, however, I hope the general's problem solution to be similiar to this specific one.

This means that a simple bit shifting is out of question.

hauron
  • 4,550
  • 5
  • 35
  • 52
  • Why not just use three bitfields? – harold Mar 27 '14 at 14:52
  • Can't you just do `encoded = ((b10 * 256) + b8_1) * 256 + b8_2` and then the reverse to decode? Should be pretty efficient. – 500 - Internal Server Error Mar 27 '14 at 15:22
  • 1
    Need some clarification here. Are the 3 identifiers numbers? Are they distinct? Can you describe in a little more detail the range of values they can take on (see discussion in MichaelS' answer). Do you know all of the different values beforehand? – waTeim Mar 27 '14 at 15:36
  • @waTeim Yes (numbers), No (not distinct), Ranges are provided, No (randomized). Thank you for the suggestion. – hauron Mar 27 '14 at 15:51

3 Answers3

1

You can get this with bit shifting and unsafe code.

There is an article on SO: What are bitwise shift (bit-shift) operators and how do they work?

Then you can use the whole 32bit range for your three values

---- 8 bits ---- | ---- 8 bits ---- | ---- 10 bits ---- | ---- unused 6 bits ----

int result = firstValue << (8 + 10 + 6);
result += secondValue << (10 + 6);
result += thirdValue << 6;
Community
  • 1
  • 1
MichaelS
  • 3,809
  • 2
  • 26
  • 33
  • You've mis understood the problem. It's not about combining an 2 8bit numbers and a 10 bit number. Its about combining 3 32 bit number that can have 255,255,1023 different (probably random) values. – waTeim Mar 27 '14 at 15:13
  • The question seems to imply that only the lower `k` bits of each word are used to specify one of the 2^k - 1 values. – chepner Mar 27 '14 at 15:17
  • 1
    Not really and the title implies the opposite. If it had been worded 2^8 - 1 sequential values, I'd agree. I think you're reading too much into it. He was talking about the map, not about the numbers. Otherwise, this is all trivial. – waTeim Mar 27 '14 at 15:19
  • Since he will use the generated number as a key I don't think that these three values can be random. Random values can run into collisions. – MichaelS Mar 27 '14 at 15:22
  • Trivial questions are asked every day, so that alone is not enough for me to assume that the values are not consecutive. The title leans a little in that direction, but not very far - that could just as well be about the width of the type rather than the range of values. OP should clarify. – harold Mar 27 '14 at 15:27
  • Hmm could go either way; the statement of the problem should probably read distinct values or explicitly state not necessarily distinct. But you can plan for the worst since there are extra bits to encode each of the 3 inputs. – waTeim Mar 27 '14 at 15:29
  • @MichaelS Hi, thank you for the discussion. The values can be in the range of <0, 2^32). There's no restriction on the actual values, but the number of unique values that can occur (as specified). waTeim was right (thank you for pointing that out). – hauron Mar 27 '14 at 15:45
1

You could try something like this:-

#include <map>
#include <iostream>

class CombinedIdentifier
{
public:
    CombinedIdentifier (unsigned id1, unsigned id2, unsigned id3)
    {
        m_id [0] = id1;
        m_id [1] = id2;
        m_id [2] = id3;
    }

    // version to throw exception on ID not found
    static CombinedIdentifier GetIdentifier (unsigned int id)
    {
        // search m_store for a value = id
        // if found, get key and return it
        // else....throw an exception->id not found
    }

    // version to return found/not found instead of throwing an exception
    static bool GetIdentifier (unsigned int id, CombinedIdentifier &out)
    {
        // search m_store for a value = id
        // if found, get key and save it to 'out' and return true
        // else....return false
    }

    int operator [] (int index) { return m_id [index]; }

    bool operator < (const CombinedIdentifier &rhs) const
    {
        return m_id [0] < rhs.m_id [0] ? true : 
               m_id [1] < rhs.m_id [1] ? true : 
               m_id [2] < rhs.m_id [2];
    }

    bool operator == (const CombinedIdentifier &rhs) const
    {
        return m_id [0] == rhs.m_id [0] &&
               m_id [1] == rhs.m_id [1] &&
               m_id [2] == rhs.m_id [2];
    }

    bool operator != (const CombinedIdentifier &rhs) const
    {
        return !operator == (rhs);
    }

    int GetID ()
    {
        int 
            id;

        std::map <CombinedIdentifier, int>::iterator
            item = m_store.find (*this);

        if (item == m_store.end ())
        {
            id = m_store.size () + 1;
            m_store [*this] = id;
        }
        else
        {
            id = item->second;
        }        

        return id;
    }

private:
    int 
        m_id [3];

   static std::map <CombinedIdentifier, int>
       m_store;
};

std::map <CombinedIdentifier, int>
    CombinedIdentifier::m_store;

int main ()
{
    CombinedIdentifier
        id1 (2, 4, 10),
        id2 (9, 14, 1230),
        id3 (4, 1, 14560),
        id4 (9, 14, 1230);

    std::cout << "id1 = " << id1.GetID () << std::endl;
    std::cout << "id2 = " << id2.GetID () << std::endl;
    std::cout << "id3 = " << id3.GetID () << std::endl;
    std::cout << "id4 = " << id4.GetID () << std::endl;
 }
Skizz
  • 69,698
  • 10
  • 71
  • 108
  • Brilliant and clean (much cleaner than three separate maps), and with easy access. Thank you for this answer. – hauron Mar 27 '14 at 15:57
  • So basically it is storing the three numbers as an array (which is a 96-bit consecutive memory)? – justhalf Mar 28 '14 at 07:43
  • @justhalf: yes, at a basic level it is, but the OP didn't specify the constraints on the IDs other than the quantity of them, so bit packing could lead to loss of data if `max(ID)` was greater than the bits available. This way preserves all three IDs. More importantly, the three IDs can be indexed using a single `int`, although I've just noticed there should be an easy way to convert from the single ID back to the three IDs (will edit answer). – Skizz Mar 28 '14 at 07:59
1

I think you could make use of a Perfect Hash Function. In particular, the link provided in that that article to Pearson Hashing seems to be appropriate. You might even be able to cut-and-paste the included C program the 2nd article except for the fact that its output is a 64-bit number not a 32-bit one. But if you modify it slightly from

for (j=0; j<8; j++) {
     // standard Pearson hash (output is h)

to

for (j=0; j<4; j++) {
     // standard Pearson hash (output is h)

You'll have what you need.

waTeim
  • 9,095
  • 2
  • 37
  • 40
  • I like this, however, I will use the simplier approach by Skizz. The only reason being clarity (it's going to be coded, and I won't be the only one maintaining it). Thank you for the answer and the link. – hauron Mar 27 '14 at 18:02
  • Another thing. This solution would not yield perfect hashing without prior good knowledge of possible inputs. A collision is unfortunately a problem. The good side would be no shared code/variables -> easy multithreading. The chosen solution will require a mutex of some sorts... – hauron Mar 27 '14 at 18:44