I am trying to measure the performance of concurrent insertion in folly
hashmap. A simplified version of a program for such insertion is brought here:
#include <folly/concurrency/ConcurrentHashMap.h>
#include <chrono>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>
const int kNumMutexLocks = 2003;
std::unique_ptr<std::mutex[]> mutices(new std::mutex[kNumMutexLocks]);
__inline__ void
concurrentInsertion(unsigned int threadId, unsigned int numInsertionsPerThread,
unsigned int numInsertions, unsigned int numUniqueKeys,
folly::ConcurrentHashMap<int, int> &follyMap) {
int base = threadId * numInsertionsPerThread;
for (int i = 0; i < numInsertionsPerThread; i++) {
int idx = base + i;
if (idx >= numInsertions)
break;
int val = idx;
int key = val % numUniqueKeys;
mutices[key % kNumMutexLocks].lock();
auto found = follyMap.find(key);
if (found != follyMap.end()) {
int oldVal = found->second;
if (oldVal < val) {
follyMap.assign(key, val);
}
} else {
follyMap.insert(key, val);
}
mutices[key % kNumMutexLocks].unlock();
}
}
void func(unsigned int numInsertions, float keyValRatio) {
const unsigned int numThreads = 12; // Simplified just for this post
unsigned int numUniqueKeys = numInsertions * keyValRatio;
unsigned int numInsertionsPerThread = ceil(numInsertions * 1.0 / numThreads);
std::vector<std::thread> insertionThreads;
insertionThreads.reserve(numThreads);
folly::ConcurrentHashMap<int, int> follyMap;
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < numThreads; i++) {
insertionThreads.emplace_back(std::thread([&, i] {
concurrentInsertion(i, numInsertionsPerThread, numInsertions,
numUniqueKeys, follyMap);
}));
}
for (int i = 0; i < numThreads; i++) {
insertionThreads[i].join();
}
auto end = std::chrono::steady_clock::now();
auto diff = end - start;
float insertionTimeMs =
std::chrono::duration<double, std::milli>(diff).count();
std::cout << "i: " << numInsertions << "\tj: " << keyValRatio
<< "\ttime: " << insertionTimeMs << std::endl;
}
int main() {
std::vector<float> js = {0.5, 0.25};
for (auto j : js) {
std::cout << "-------------" << std::endl;
for (int i = 2048; i < 4194304 * 8; i *= 2) {
func(i, j);
}
}
}
The problem is that using this loop in main, suddenly increases the measured time in the func
function. That is, if I call the function directly from main without any loop (as shown in what follows), the measure time for some cases is suddenly more than 100X smaller.
int main() {
func(2048, 0.25); // ~ 100X faster now that the loop is gone.
}
Possible Reasons
- I allocate a huge amount of memory while building the hasmap. I believe when I run the code in a loop, while the second iteration of loop being executed the computer is busy freeing the memory for the first iteration. Hence, the program becomes much slower. If this is the case, I'd be grateful if someone can suggest a change that I can get the same results with loop.
More Details
Please note that if I unroll the loop in main, I have the same issue. That is, the following program has the same problem:
int main() {
performComputation(input A);
...
performComputation(input Z);
}
Sample Output
The output of the first program is shown here:
i: 2048 j: 0.5 time: 1.39932
...
i: 16777216 j: 0.5 time: 3704.33
-------------
i: 2048 j: 0.25 time: 277.427 <= sudden increase in execution time
i: 4096 j: 0.25 time: 157.236
i: 8192 j: 0.25 time: 50.7963
i: 16384 j: 0.25 time: 133.151
i: 32768 j: 0.25 time: 8.75953
...
i: 2048 j: 0.25 time: 162.663
Running the func
alone in main with i=2048
and j=0.25
yields:
i: 2048 j: 0.25 time: 1.01
Any comment/insight is highly appreciated.