0

I need to map a std::vector<uint64_t> to a single uint64_t. It is possible to do? I thought to use a hash function. Is that a solution?

For example, this vector:

std::vector<uint64_t> v {
  16377,
  2631694347470643681,
  11730294873282192384
}

should be converted into one uint64_t.

If a hash function is not a good solution (e.g. high percentage of collision) there is an alternative to do this mapping?

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
th3g3ntl3man
  • 1,926
  • 5
  • 29
  • 50
  • Get a few of the values (say the first 3), multiply each with a number that's a prime. And you have a new uint64_t value that will do as hash. Don't mind the overflows something todo with modulo 2^n not sharing a foctor with prime numbers. How many numbers you need depends on how unique the values you have in the vector are. – Pepijn Kramer Sep 01 '21 at 17:09
  • what are you going to do with the hash? Is the vector to be inserted to an unordered_set? – Albin Paul Sep 01 '21 at 17:13
  • 1
    related/dupe: https://stackoverflow.com/questions/20511347/a-good-hash-function-for-a-vector – NathanOliver Sep 01 '21 at 17:15
  • Another idea is to use multiple hashes for the same vector so that the probabilty of collision decreases. Like how bloomfilters work. – Albin Paul Sep 01 '21 at 17:17
  • 2
    Note that hashing is not compression. Compression implies some ability to decompress. – Mark Adler Sep 01 '21 at 17:39

3 Answers3

1

I need to hash a std::vector<uint64_t> to a single uint64_t. It is possibile to do?

Yes, variable length hash functions exist, and it's possible to implement them in C++.

C++ standard library comes with a few hash functions, but unfortunately not for vector (other than for the bool specialisation). We can reuse the hash function provided for string views, but this is a bit of a cludge:

const char* data = reinterpret_cast<const char*>(v.data());
std::size_t size = v.size() * sizeof(v[0]);
std::hash<std::string_view> hash;
std::cout << hash(std::string_view(data, size));

Note that using this is reasonable only in the case std::has_unique_object_representations_v is true of the element type of vector. I think it's reasonable to assume that to be the case for std::uint64_t.

A caveat when using standard library hash functions is that they don't have exact specification and as such you cannot rely on hashes being identical across separate systems. You should use another hash function if that is a concern.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Are you sure that `string_view` hash would not stop hashing if it sees embedded zeros (which are very likely for small integers put into `uint64_t`)? – Alex Guteniev Sep 01 '21 at 18:49
  • @AlexGuteniev The hash function isn't exactly specified. Technically, it could be as bad as hashing the first character or whatever. But if we assume a reasonable implementation, then given that strings may contain null terminators, and given that string views aren't guranteed to contain any null terminators I don't see a reason why the hash function would stop at the first one. – eerorika Sep 01 '21 at 19:06
0

You can create an std::map<std::vector<uint64_t>, uint64_t>, create a compare function for your vectors and just keep adding them to a map while incrementing a counter.

That counter will be your hash value.

Vlad Feinstein
  • 10,960
  • 1
  • 12
  • 27
0

The comment above in code :

#include <array>
#include <algorithm>
#include <vector>
#include <iostream>

static std::array<size_t,5> primes = { 3,5,7,11,13 };

static std::uint64_t hash(const std::vector<std::uint64_t>& v)
{
    std::uint64_t hash = v[0];
    for (size_t n = 1; n < std::min(primes.size(), v.size()); ++n) hash += (primes[n]*v[n]);
    return hash;
}

int main()
{
    std::vector<uint64_t> v{ 16377,  2631694347470643681,  11730294873282192384 };
    std::cout << hash(v);
    return 0;
}
Pepijn Kramer
  • 9,356
  • 2
  • 8
  • 19