QtConcurrent slowdown with long-lived object pointers

Question

I'm in the process of adding multithreading to several CPU-intensive processes on a list of long-lived object pointers. Roughly 60 million of these objects were created and added to a primary list on the main processing thread.

All of the work occurs in two lambda functors, one to process the data (myMap) and one to collect the results (myReduce). The main list gets divided into four sub-lists of roughly 15 million each and sent to QtConcurrent::mappedReduced to do work. Here's some example code:

//main thread
const int count = 60000000;
QList<MyObject*> list;
for(int i = 0; i < count; ++i) {
    MyObject* obj = new MyObject;
    obj.readFromFile(path);
    list << obj;
}

QList<QList<MyObject*> > sublists;
for(int i = 0; i < count; i += count/4) {
    sublists << list.mid(i, count/4);
}

QThreadPool::globalInstance()->setMaxThreadCount(1);    //slowdown when set to 4??
Result results_total;

std::function<Result (const QList<MyObject*>&)>
myMap = [](const QList<MyObject*>& m) -> Result {
    //do lots of work on individual MyObjects, querying and modifying them
};

auto myReduce = [&results_total](bool& /*noreturn*/, const Result& result) {
    results_total.count += result.count;
    results_total.othernumber += result.othernumber;
};

QFutureWatcher<void> fw;
fw.setFuture(QtConcurrent::mappedReduced<bool>(
             sublists, myMap, myReduce,
             QtConcurrent::OrderedReduce | QtConcurrent::SequentialReduce));
fw.waitForFinished();

Here's the kicker: When I setMaxThreadCount to 4 instead of 1, the procedure slows down by 10% instead of speeding up 200-400%. I used the exact same methodology (split a list into fourths and run it through QtConcurrent) on another procedure and ran it on the exact same dataset for a roughly 4x speed boost as expected by using 4 threads instead of 1.

Googling around suggests that there must be a shared resource in the myRun functor somewhere, but I can't find anything at all that's shared between the processing threads other than the original list of MyObjects that exist on the main thread.

So here's the question: Does the fact that MyObject was created in a different thread than the processing thread matter if I can guarantee that there are no synchronization issues? This link suggests it doesn't matter, but that heap memory block seems to be the only thing both threads share.

I'm running Qt 4.8.6 on Windows 7 Pro x64 with an i7 processor.

Most likely, the heap mutex is the shared resource. If your functors create or destory objects on the heap, that's your problem. — Kuba hasn't forgotten Monica, Feb 08 '16 at 21:02
Interesting. So there might be a mutex blocking asynchronous access to the heap itself? To be clear, the functors do no creation or destruction on the heap - just some work on the stack and _modification_ of the MyObjects on the heap. — Phlucious, Feb 08 '16 at 21:09
@KubaOber: I commented out all lines that modify the heap objects but left in lines that read members of the heap objects. The slowdown still exists, so it's not the modification of MyObject that seems to matter. — Phlucious, Feb 08 '16 at 21:14
@KubaOber: As it turns out, I'm fairly certain you were right. My slowdown appears to have come from multiple repeated _reallocations_ (building, re-sorting, and purging) sub-lists of pointers within myMap. Apparently I did it enough times that the allocations mattered even though I wasn't calling `new` or `delete` on anything, and even though each thread should have its own heap. http://stackoverflow.com/questions/4859263/can-multithreading-speed-up-memory-allocation — Phlucious, Feb 09 '16 at 18:07

QtConcurrent slowdown with long-lived object pointers

0 Answers0