4

I am trying to use openmp for a for loop in which i am trying to insert/update a hash map (std::unordered_map)

Hash map and keys are actually members of a class so I assigned pointers to represent their addresses.Key is also a hash value returned by a global function.

The following seems the easiest way of doing this but the hash map is not updated correctly. Something(s) is/are wrong but i am not sure how to fix.Thanks in advance.

void MyClass::ProcessBuffer(void)
{
    omp_set_num_threads(4);
    std::unordered_map<unsigned long long,unsigned int>* hashptr=&m_sequencehash;
    std::vector<std::string>* bufferptr=&m_buffer;
    unsigned int sizevec=m_kmer_size;
    size_t i;
    #pragma omp parallel for
    for (i=0; i<READSTR_BUF_SIZE;++i)
    {
        ++(*hashptr)[_hash((*bufferptr)[i],sizevec)];
    }

}
Krcn U
  • 411
  • 1
  • 9
  • 16
  • 3
    `std::unordered_map` is not thread safe, and so your code has a race condition, resulting in undefined behavior. – Ami Tavory May 01 '17 at 20:51
  • Maybe [Adding data to a hashmap in parallel](http://stackoverflow.com/questions/10064372/adding-data-to-a-hashmap-in-parallel) – Arash May 01 '17 at 21:00
  • Just curious about the performance gain you will get from paralleling an operation that will eventually require locking to be thread safe?! – Mohamed Moanis May 01 '17 at 21:06
  • You might consider the use of a TBB concurrent hashmap https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/concurrent_hash_map.html you don't need to use the rest of TBB , though that might also make sense! – Jim Cownie May 03 '17 at 08:41
  • A concurrent map is an overkill for this purpose since it only needs to access concurrently during the initialization step. However, in memory map-reduce library will fit perfectly for this scenario. – Liran Funaro May 04 '17 at 16:35

1 Answers1

4

The easiest way to solve this is creating a new map for each thread, then reducing them to a single map sequentially. This is a classic map-reduce scenario.

int s = omp_get_num_threads();
std::unordered_map<unsigned long long,unsigned int> res[s];

// Map step
#pragma omp parallel for
for (i=0; i<READSTR_BUF_SIZE;++i)
{
    int t = omp_get_thread_num();
    res[t][_hash((*bufferptr)[i],sizevec)]++;
}

// Reduce step
for (int i=0; i < s; i++) {
    for (auto r : res[s]) {
        (*hashptr)[r.first] += r.second;
    }
}

Performing the reduction concurrently might be dangerous because you will still have to access the same map concurrently. If you don't know the implementation of the map, you can't know for sure that this is safe.

Alternatively, you can partition the hash values between different maps by putting different hash intervals in different buckets. This will prevent the different threads from accessing the same hash value in the reduce step. However, on a small dataset, it is hard to find a good partition function with a small number of buckets. Using too many buckets might have significant overhead compared to the serial approch.

Liran Funaro
  • 2,750
  • 2
  • 22
  • 33
  • The answer is on the right track, but the definition of `res` / access to `r` in the `parallel for` is wrong. You should also use a proper reduction instead of the manual one. See also [this answer](http://stackoverflow.com/a/43064331/620382). – Zulan May 04 '17 at 16:06
  • Thanks, I fixed the access to res. – Liran Funaro May 04 '17 at 16:09
  • Using proper reduction is dangerous because you will still have to access the same map concurrently. I don't know the implementation of the map so I can't know for sure that this is safe. – Liran Funaro May 04 '17 at 16:10
  • You still have only one shared map in your code. While I can't quite find something in the standard, I'm fairly it is guaranteed that reduction combiners that work on the same data are executed mutually exclusively and with proper memory isolation. Otherwise all implicitly and user declared combiners I have ever seen would be wrong. – Zulan May 04 '17 at 16:24
  • A user defined combiner will have to allocate a new map for each thread at each iteration of the reduction (log(#CPUs)). That might be more expensive (due to the constant malloc) compared to the serial (manual) reduction. – Liran Funaro May 04 '17 at 16:32
  • 1
    No you it does not need to allocate a new map - one of the inputs is also the output for the combiner. Also don't assume `log(nthreads)` operations ([related answer](http://stackoverflow.com/a/35805567/620382)) – Zulan May 04 '17 at 17:25