Is an unordered_set an appropriate data structure for storing vector elements? And if so how would I go about implementing the hash function?

Question

so this question is prompted by a HackerRank practice exercise that I am currently attempting. The exercise "Journey to the Moon", takes a given list of pairs of astronauts that are prevented from being grouped together in the final result.

The original container for the list of astronauts was vector<vector<int>> astronaut, but in my implementation I changed this list (after researching) to be an unordered_set<vector<int>> astronauts because I found that a lot of the overhead of this problem was satisfied by this data structure. The problem is that I now don't know how I should go about hashing each element of astronaut. I understand that standard C++ does not provide a default implementation for hashing vector values, and that I must provide my own implementation for the hashing to the vector template. But just how do I do that (I don't understand much about hashing). However, I also read that one should avoid using containers as keys for unordered_sets; hence I am stuck.

Is an unordered_set really that best data structure for what I am trying to do: store unique pairs of integers within a set where the order of the pairs doesn't particularly matter, and provide constant-time access to the elements or is there a better container for what I am trying to do. Here is my code prior to attempting to implement hashing. main() and split_string() were pre-defined. Thanks in advance for the help!

HackerRank Link: https://www.hackerrank.com/challenges/journey-to-the-moon/problem

using namespace std;

vector<string> split_string(string);

template <>
struct hash<pair<int, int> > {
    size_t operator()(const pair<int, int>& x) const noexcept
    {
        return (size_t)x.first * x.second + x.first + x.second;
    }
};

struct custom_set : unordered_set<int>
{
    void pair_insert(pair<int, int> pair)
    {
        insert(pair.first);
        insert(pair.second);
    }

    void pairs_insert(std::initializer_list <pair<int, int>> pairs)
    {
        for (pair<int, int> pair : pairs)
      {
            insert(pair.first);
            insert(pair.second);
        }
    }
};


pair<int, int> journeyToMoon(pair<int, int> id_pair1, unordered_set<pair<int, int>,      hash<pair<int, int>>> * astronaut,
    custom_set * temp_set, unordered_set<pair<int, int>>::iterator it);


int journeyToMoon(int n, unordered_set<pair<int, int>, hash<pair<int, int>>> *     astronaut)
//astronaut ids numbered : [0, n-1]
{



    vector<unordered_set<int>> sets_of_bounded_astronauts;
    vector<int> num_bounded_astronauts_each_set;
    int num_bounded_astronauts_total = 0, num_free_astronauts = 0, result = 0;

    while (!astronaut->empty())
    {


        pair<int, int> id_pair = *astronaut->begin();
        custom_set * temp_set = new custom_set;
        journeyToMoon(id_pair, astronaut, temp_set, ++astronaut->begin());
        sets_of_bounded_astronauts.push_back(*temp_set);
        num_bounded_astronauts_each_set.push_back(sets_of_bounded_astronauts.back().size()); 
        num_bounded_astronauts_total += sets_of_bounded_astronauts.back().size(); 
        delete temp_set;
    }

    num_free_astronauts = n - num_bounded_astronauts_total;

    for (int i = 0; i < num_bounded_astronauts_each_set.size() - 1; i++)
    {
        for (int j = i + 1; j < num_bounded_astronauts_each_set.size(); j++)
            result += num_bounded_astronauts_each_set[i] *    num_bounded_astronauts_each_set[j];
        result += num_free_astronauts * num_bounded_astronauts_each_set[i];
    }

    result += num_free_astronauts * num_bounded_astronauts_each_set.back() +     (num_free_astronauts * (num_free_astronauts - 1))/2;

    return result;
}

pair<int, int> journeyToMoon(pair<int, int> id_pair1, unordered_set<pair<int, int> ,     hash<pair<int, int>>> * astronaut,
    custom_set * temp_set, unordered_set<pair<int, int>>::iterator it)
{

    while (!astronaut->empty() && it != astronaut->end()) {
    // copy the current iterator then increment it
        astronaut->erase(id_pair1);
        pair<int, int> id_pair2 = *it++;

        if (id_pair2.first == id_pair1.first || id_pair2.first ==   id_pair1.second || id_pair2.second == id_pair1.first
            || id_pair2.second == id_pair1.second)
        {           
            temp_set->pairs_insert({ id_pair1, journeyToMoon(id_pair2,     astronaut, temp_set, 
                id_pair2 != *astronaut->begin() ? astronaut->begin() : ++astronaut->begin()) });
        }
    }
    astronaut->erase(id_pair1);
    temp_set->pair_insert(id_pair1); //the case where id_pair1 is not matched with any other pairs in the list and also the case
//where astronaut.size() == 1; if it so happens that id_pair1 was already inserted then the functionality of sets prevents duplicates
    return id_pair1;

}

int main()
{



    string np_temp;
    std::getline(std::cin, np_temp);

    vector<string> np = split_string(np_temp);

    int n = stoi(np[0]);

    int p = stoi(np[1]);

    unordered_set<pair<int, int>, hash<pair<int, int>>> * astronaut = new     unordered_set<pair<int, int>, hash<pair<int, int>>>(p);
    for (int i = 0; i < p; i++) {
        int a, b;
        std::cin >> a >> b;
        astronaut->insert(pair<int, int>(a, b));
        }

    std::cin.ignore(numeric_limits<streamsize>::max(), '\n');

    int result = journeyToMoon(n, astronaut);


    std::cout << result << "\n";

    delete astronaut;

    return 0;
}

vector<string> split_string(string input_string)
{
    string::iterator new_end = unique(input_string.begin(), input_string.end(), [](const char &x, const char &y) {
        return x == y && x == ' ';
    });

     input_string.erase(new_end, input_string.end());

    while (input_string[input_string.length() - 1] == ' ') {
    input_string.pop_back();
    }

    vector<string> splits;
    char delimiter = ' ';

    size_t i = 0;
    size_t pos = input_string.find(delimiter);

    while (pos != string::npos) {
        splits.push_back(input_string.substr(i, pos - i));

        i = pos + 1;
        pos = input_string.find(delimiter, i);
    }

    splits.push_back(input_string.substr(i, min(pos, input_string.length()) - i +     1));

    return splits;
}

Not an answer to "is this the best data structure," but implementing your own hash function isn't too hard. See [here](https://stackoverflow.com/questions/29855908/c-unordered-set-of-vectors) and [here](https://stackoverflow.com/questions/2590677/how-do-i-combine-hash-values-in-c0x/2595226#2595226) for more... — scohe001, Jun 01 '18 at 20:29
If you only store unique pairs of 32-bit ints in your set, you could consider using `std::pair` instead of `std::vector`. In addition you could order the pair to avoid mirroring and then for hashing you can take the already defined `std::hash` for a 64-bit int and merge the two 32-bits together to get a unique hash. — super, Jun 01 '18 at 20:33
You are aware that you are not supposed to edit provided code on hackerrank challenges? If the function accepts vector, it should keep accepting vector. — SergeyA, Jun 01 '18 at 20:34
@SergeyA Thanks for alerting me. I had an inkling, but wasn't sure if we were free to edit the problem or not. But say I had not altered the provided code, but instead copied the vector> into the unordered_set that I am using now. Would that not be the same thing as what I am doing now, but with more code. What's the point of maintaining the provided data structure if I'm not going to use it anyway. Or is it that I'm supposed to find a way to use it? — Stanton Dobson, Jun 10 '18 at 01:09

Edy · Answer 1 · 2018-06-01T22:46:58.197

1

In general, unordered_set is NOT an appropriate data structure for storing elements in a vector, because doing so destroys, by definition, the ordering of the original elements, which is a key feature of the vector.

In your case, however, it appears the ordering of the list of astronaut pairs doesn't matter (as long as the exclusion of all pairs are satisfied). So in this particular case, you probably could use unordered_set instead of vector to store the list.

In fact, instead of vector<int>, you should use pair<int, int> to store a pair of astronauts, and then implement the hash function as follows:

template <>
struct hash<pair<int, int> > {
  size_t operator()(const pair<int, int>& x) const noexcept {
    return (size_t)x.first * x.second + x.first + x.second;
  }
};

EDIT: The choice of the simple hashing "algorithm" may not be perfect -- you could certainly improve on it. Note that it makes Hash(a,b) == Hash(b,a), which might be a property that suits the application here. You may wish to implement your own unordered_pair and use it instead of std::pair.

The unordered set of astronaut pairs could then be defined as

unordered_set<pair<int, int> > astronaut_pairs;

See also: https://stackoverflow.com/a/18098536/4509057

edited Jun 01 '18 at 22:46

answered Jun 01 '18 at 22:13

Edy

462
3
9

Thank you for replying. I'll give this implementation a shot, and will get back to you with the results. – Stanton Dobson Jun 01 '18 at 22:33
Hi, sorry it took me so long to get back to you. I was a bit busy. So, I implemented your simple hash function which allowed me to use the unordered_set. My program since then has gone through a lot of revision concerning my implementation of the recursive part of the program. Since I was iterating through and simultaneously deleting pairs from the set of astronaut pairs, I had to ensure that the iterator wasn't invalidated. Currently, my program works for several small test cases but fails on larger input... – Stanton Dobson Jun 10 '18 at 00:59
I was able to figure out what was causing the program to fail but I don't exactly understand why. With large lists, my program does link and delete all connected pairs within the list of astronauts. But when it reaches the end of the list in a particular recursive call and starts to insert the linked pairs into temp_list, the iterator of the previous call become undefined, which normally would not be a problem except the recursive call is nested in a while loop that checks the iterator... – Stanton Dobson Jun 10 '18 at 01:01
To be honest, I believe that once the end of the list is reach the while loop should be ignored. I don’t really understand why it’s being checked for every call. I’m thinking that I may need to place the recursive call outside of the while loop or maybe created a bool variable that is set to false once the end is reached. But, I’m not sure if this is the right approach. Perhaps you know a better way. Thanks again! – Stanton Dobson Jun 10 '18 at 01:01
@StantonDobson It is difficult to answer with only a vague description of the recursive algorithm and its problem. Please provide more detail -- is the problem related to the hash function? Or perhaps you should open a different question for it? – Edy Jun 12 '18 at 00:38
@StantonDobson It is difficult to answer with only a vague description of the recursive algorithm and its problem. Please provide more detail -- is the problem related to the hash function? Or perhaps you should open a different question for it? – Edy Jun 12 '18 at 00:40

Is an unordered_set an appropriate data structure for storing vector elements? And if so how would I go about implementing the hash function?

1 Answers1