2

My main data object is a array of doubles of a length that depends on a specific instantiation of my class. I would like to construct a very simple hash table to store/retrieve these objects, and we can assume that the numbers are generated in a way that is free of numerical error.

int main() {
  std::tr1::unordered_map<double*, double*> cache;

  double x1[] = { 1.0, 3.14 };
  double x2[] = { 1.0, 3.14 };

  cache[x1] = x1;

  std::cout << "x1: " << cache.count(x1) << std::endl;
  std::cout << "x2: " << cache.count(x2) << std::endl;

  return 0;
}

The above obviously only compares the pointers, giving the output:

> ./tmp
x1: 1
x2: 0

When I really want to see:

> ./tmp
x1: 1
x2: 1

It's pretty clear how to create custom hashing and equality functions when the size of the arrays are fixed at compile time but I do not know how to make custom functions that depend on a specific instantiation... I created a class below, but I'm not sure if it's useful, or how it could be used.

class Hash_double_vec {

public:
  int dim;
  Hash_double_vec(int d) { dim = d; }

  size_t operator()(const double *x) const{
    std::tr1::hash<double> hash_fn;
    size_t r = hash_fn(x[0]);
    for(int i=1;i<dim;i++) r ^= hash_fn(x[i]);
    return r;
  }

  bool operator()(const double *x, const double *y) const{
    for(int i=1;i<dim;i++) if (fabs(x[i]-y[i]) > 1e-10) return false;
    return true;
  }
};
svick
  • 236,525
  • 50
  • 385
  • 514
fairidox
  • 3,358
  • 2
  • 28
  • 29

1 Answers1

3

One way would be to create struct to hold the pointer to the sequence of doubles:

struct DoubleRegion
{
    double* p;
    size_t size;
};

bool operator==(DoubleRegion a, DoubleRegion b)
{
    return a.size == b.size && memcmp(a.p, b.p, a.size) == 0;
}

size_t hash(DoubleRegion dr) 
{
    size_t h = 0;
    for (double* p = dr.p; p != dr.p + dr.size; ++p)
        h ^= hash(*p);
    return h;
}

And then use it:

unordered_map<DoubleRegion, DoubleRegion> cache;

Of course it is your problem to make sure the lifetime of the backing memory is a superset of the lifetime of the DoubleRegion.

Old Answer:

If you don't know until runtime how big the key and value is going to be, use a std::vector:

unordered_map<vector<double>, vector<double>> cache;

If you know at compile-time how big you can use an std::array:

unordered_map<array<double, N>, array<double, N>> cache;

In both cases the default hashing function will work by value as you want, and you do not need to define a custom one.

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
  • I'm using double arrays because it's a scientific application with a lot of manipulations on these objects, double[] has significantly less memory and computational overhead than stl::vector. I could implement the hash by constantly converting between them but that is obviously expensive. – fairidox Mar 24 '12 at 03:28
  • Apart from two size_t counts (one for capacity, one for size) there is virtually no overhead - but why can't you use std::array? What are the possible values of N? Can you make N a template parameter of your class? If so you can pass that through to std::array. – Andrew Tomazos Mar 24 '12 at 03:33
  • N is somewhere between 1 and 100, the overhead involved is related to manipulating long strings of doubles. that is, I have a double[] array 10 million long and I grab out pieces of the array in sizes between 1 and 100 randomly and I need to compute various statistics. I could perhaps have done this with a instead but it's much easier to pass around a &array[k] opposed to constructing an iterator, looping over said iterator to construct a new , then passing along that vector – fairidox Mar 24 '12 at 03:39
  • Ok so you want to avoid copying the memory from the big array. I've updated my answer to sketch how to do it by reffering to regions. – Andrew Tomazos Mar 24 '12 at 03:56
  • Ah, yes, this works. In the mean time I did something similar, the point is you have to define the key of the data structure to carry the size as well. i.e. it cannot be a global, at least not easily. – fairidox Mar 24 '12 at 04:38
  • @AndrewTomazos Is the old answer correct? It doesn't work for me. Please see: http://stackoverflow.com/questions/35183379/compilation-error-while-creating-array-as-key-of-unordered-map – Lokesh Feb 03 '16 at 17:24
  • @Lokesh: Looking into it. At the moment it looks like vector and array don't have hash specializations, so my old answer is not correct. Need to confirm. – Andrew Tomazos Feb 04 '16 at 10:07