I'm in the process of adding multithreading to several CPU-intensive processes on a list of long-lived object pointers. Roughly 60 million of these objects were created and added to a primary list on the main processing thread.
All of the work occurs in two lambda functors, one to process the data (myMap) and one to collect the results (myReduce). The main list gets divided into four sub-lists of roughly 15 million each and sent to QtConcurrent::mappedReduced to do work. Here's some example code:
//main thread
const int count = 60000000;
QList<MyObject*> list;
for(int i = 0; i < count; ++i) {
MyObject* obj = new MyObject;
obj.readFromFile(path);
list << obj;
}
QList<QList<MyObject*> > sublists;
for(int i = 0; i < count; i += count/4) {
sublists << list.mid(i, count/4);
}
QThreadPool::globalInstance()->setMaxThreadCount(1); //slowdown when set to 4??
Result results_total;
std::function<Result (const QList<MyObject*>&)>
myMap = [](const QList<MyObject*>& m) -> Result {
//do lots of work on individual MyObjects, querying and modifying them
};
auto myReduce = [&results_total](bool& /*noreturn*/, const Result& result) {
results_total.count += result.count;
results_total.othernumber += result.othernumber;
};
QFutureWatcher<void> fw;
fw.setFuture(QtConcurrent::mappedReduced<bool>(
sublists, myMap, myReduce,
QtConcurrent::OrderedReduce | QtConcurrent::SequentialReduce));
fw.waitForFinished();
Here's the kicker: When I setMaxThreadCount to 4 instead of 1, the procedure slows down by 10% instead of speeding up 200-400%. I used the exact same methodology (split a list into fourths and run it through QtConcurrent) on another procedure and ran it on the exact same dataset for a roughly 4x speed boost as expected by using 4 threads instead of 1.
Googling around suggests that there must be a shared resource in the myRun
functor somewhere, but I can't find anything at all that's shared between the processing threads other than the original list of MyObject
s that exist on the main thread.
So here's the question: Does the fact that MyObject was created in a different thread than the processing thread matter if I can guarantee that there are no synchronization issues? This link suggests it doesn't matter, but that heap memory block seems to be the only thing both threads share.
I'm running Qt 4.8.6 on Windows 7 Pro x64 with an i7 processor.