0

I have a Worker Thread that copes with heavy and long computations (up to tenth of seconds). These computations produce several thousands of QLines, representing the edges of a dynamically-growing tree. These edges can be modified anytime, since they connect the nodes of the trees by checking the cost, represented by the distance. I would like a smooth update of the QGraphicsScene containing the edges. I tried with signal and slots:

  • Worker thread emits a signal, so when the buffer is full this signal gets caught by the main thread, that will cope with the update/drawing of the line
  • This signal gets still caught by the main thread, but it seems it gets emitted very often, so QGraphicsView gets choked with QLine to be added
  • Changing the size of the buffer doesn't matter
  • Is there an alternative approach to this?

The main slot is:

void MainWindow::update_scene(bufferType buffer)
{
  for (int i = 0; i < buffer.size(); ++i)
  {
      if (buffer[i].first < (g_edges.size() - 1))
      {
          delete g_edges[buffer[i].first];
          g_edges[buffer[i].first] = scene->addLine(buffer[i].second);
      }
      else
          g_edges.push_back(scene->addLine(buffer[i].second));
   }
}

Note that bufferType is of type QList<std::pair<int,QLine>>. Here is the heavy computing part

while (T.size() < max_nodes_number && !_stop)
{
    const cnode random_node = rand_conf ();
    const cnode nearest_node = T.nearest_node (random_node);
    cnode new_node = new_conf (nearest_node, random_node);

    if (obstacle_free(nearest_node, new_node))
    {
        QList<cnode*> X_near = T.neighbours (new_node, max_neighbour_radius);
        cnode lowest_cost_node = nearest_node;
        qreal c_min = nearest_node.cost() + T.distance (nearest_node, new_node);

        for (int j = 0; j < X_near.size(); ++j)
        {
            if (obstacle_free(*X_near[j], new_node) && ((X_near[j]->cost() + T.distance (*X_near[j], new_node)) < c_min))
            {
                c_min = X_near[j]->cost() + T.distance (*X_near[j], new_node);
                lowest_cost_node = *X_near[j];
            }
        }

        T.add_node (new_node, lowest_cost_node.id());
        queue (new_node.id(), QLine (new_node.x(), new_node.y(), lowest_cost_node.x(), lowest_cost_node.y()));

        for (int j = 0; j < X_near.size(); ++j)
        {
            if (obstacle_free(*X_near[j], new_node) && (new_node.cost() + T.distance (new_node, *X_near[j])) < X_near[j]->cost())
            {
                queue (X_near[j]->id(), QLine (new_node.x(), new_node.y(), X_near[j]->x(), X_near[j]->y()));

                T.update_parent (*X_near[j], new_node.id());
                T.rewire_tree (X_near[j]->id());
            }
        }
    }
}
emit finished();

Please note that T is a class representing a Tree. It is constituted by some methods allowing to add a node, searching for the nearest one, etc. It has a QList<cnode> as private member, storing the tree's nodes. cnode is a structure constituted of two coordinates, an id, a parent, a cost, a list of its children.

Michael
  • 876
  • 9
  • 29
  • What type is `edges`? Did you profile where the hotspot is? Is it inside *QGraphicsScene::addLine* or in any of your defined functions? -- Generally: `removeAt(pos)` + `insert(pos)` for QList/QVector is expensive; this could already be a bottleneck. – kfunk Jan 03 '16 at 23:07
  • @kfunk ``edges`` is a ``QList``. The problem is in those two slots: they are called very frequently, and it seems that adding a single QLine to a Scene takes a lot of time. EDIT: even if I remove updateLine and use only drawLine, after a couple of seconds the GUI becomes irresponsive, so this problem is caused by the bug number of times that method is called. – Michael Jan 03 '16 at 23:09
  • How often do you re-compute? How many times is `drawLine` usually invoked whenever you have re-computed? Does the GUI stay responsive if you reduce the number of re-computations? -- Please note: Lots of guessing from my side involved without proper profiling data... – kfunk Jan 03 '16 at 23:19
  • @kfunk : check my edit – Michael Jan 03 '16 at 23:27
  • Still not sufficient. I suggest you run your program through a profiler and see where the real bottleneck is. If you're on Linux: try running the program through valgrind-callgrind + kcachegrind for visualisation. -- Guesses: When you're just using `drawLine` you may end up adding too many QGraphicsItems => QGraphicsScene chokes; When you use `updateLine` you may put a lot of stress on the QList (keep in mind inserting/removing elements at random positions is a *O(n)* operation for QList). Try `edges[i] = ...` here. – kfunk Jan 03 '16 at 23:36
  • @Michael - who taught you to put spacebar before function calls? That's a terrible habit. – dtech Jan 03 '16 at 23:48
  • I repeat, even skipping the part involving the addition of the edge to the QList does not solve the problem. Is there a way to enqueue the calls of the slots and slowly process them, without choking the thread? – Michael Jan 03 '16 at 23:55
  • @Michael there is a way, I already pointed it out in my answer. It doesn't matter how "slowly" you process signals, if you emit them frequently you inevitably flood the event loop. So you shouldn't do it for every single line. Do it every 50 milliseconds or something like that. – dtech Jan 03 '16 at 23:59
  • @ddriver : I know, the problem is, I do not know how to implement a reliable mechanism to keep the GUI responsive, without slowing the computational process, both adding new lines and replacing existing lines. Buffers introduces an inevitable slowness, since they are shared and they need a mutex to be safely accessed, so... – Michael Jan 04 '16 at 00:07
  • @Michael - you may avoid the mutex. Specifically for passing the result buffers - containers in Qt are COW - implicit sharing with copy on write, and the sharing itself is atomic so it is thread safe, when you pass the buffer and then clear, the clear as a write operation will detach the local container from the data passed to the main thread, the container copy you passed will not be cleared - it will persist. Resorting to mutexes will only increase your "hanging time". – dtech Jan 04 '16 at 00:20
  • @ddriver check my code update. Even modeled in this way, it doesn't work. The slot update_scene seems to get called still too many times – Michael Jan 04 '16 at 15:13
  • @Michael - then spread out the update flushing. As slow as queued connections are, you can still comfortably have hundreds per second which is more than enough for updating GUI. – dtech Jan 04 '16 at 15:58
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/99709/discussion-between-michael-and-ddriver). – Michael Jan 04 '16 at 16:08

1 Answers1

2

The solution is as usual - avoid frequent queued connections, as those are quite slow. Queued connections are a coarse grain construct and such be used as such.

Batch the work. In your scenario, you could aggregate the computed lines in a container, and only when it reaches a certain threshold, pass that container to the main thread to draw/update the lines. The threshold can be count, time or a combination of both, you don't want not updating if there are only a few results to update. You will need to expand on your design to split the while loop to run in the thread event loop instead of blocking so you can aggregate and pass updates periodically - something similar to this. This is always a good idea for workers that take time - you can monitor progress, cancel, pause and all sorts of handy stuff.

Those 2 lines look fishy:

    edges.removeAt(i);
    edges.insert (i, scene->addLine (l));

Then you remove and then insert - that's an invitation for potential costly reallocation, even without reallocation there is unnecessary copying involved. Instead of removing and inserting you can simply replace the element at that index.

In your case you might omit splitting the actual while loop. Just don't emit in the loop, instead do something like this (pseudocode):

while(...) {
    ...
    queue(new line)
    ...
    queue(update line)
    ...
    queue(final flush)
}

void queue(stuff) {
    stuffBuffer.append(stuff)
    if (stuffBuffer.size() > 50 || final_flush) {
        emit do_stuff(stuffBuffer) // pass by copy
        stuffBuffer.clear() // COW will only clear stuffBuffer but not the copy passed above
    } 
}

Or if it will make you feel better:

    copy = stuffBuffer
    stuffBuffer.clear()
    emit do_stuff(copy)

This way the two containers are detached from the shared data before the copy is emitted.

EDIT: After a long discussion I ended up proposing a number of changes to the design to improve performance (the queued connections were only one aspect of the performance problem):

  • alleviate the graphics scene - find a compromise between "one item per line" and "one item for all lines" solution, where each item handles the drawing of the lines of its direct children, balancing between the CPU time for adding items to the scene and redrawing items on data changes.

  • disable automatic scene updates, and instead control the scene update explicitly, this way the scene is not updated for each and every tiny change.

  • aggregate view commands in batches and submit the work buffer at a fixed interval to avoid queued signals overhead.

Community
  • 1
  • 1
dtech
  • 47,916
  • 17
  • 112
  • 190
  • The two _fishy_ lines aren't a problem, since if I simply use ``drawLine()`` without ``updateLine()`` and without adding the line to the list, GUI still becomes unresponsive. My original development was in fact a main thread, where I had to call QCoreApplication::processEvents() each time I added an object to keep the GUI responsive, slowing the computational part. I'd really like to have a computational part and a detached graphical parts, so the computational one does not depend on the graphical one, but it seems really tricky with Qt :/ – Michael Jan 03 '16 at 23:53
  • They *are* a problem even if they are not **the** problem. This will copy all elements after i one position back just to have them copied back to their original position on the next line. This can be avoided completely. Eventual memory reallocation will increase the cost even further. – dtech Jan 03 '16 at 23:57
  • Okay, I'll change that. Anyway, you suggested to use a buffer and to _pass that container to the main thread to draw/update the lines_ : so, I would need a buffer for the new lines, and a buffer to contain the indexes of the old lines to be replaced plus the new lines which will replace the old ones. – Michael Jan 04 '16 at 00:01
  • Yeah pretty much. Do the calculation in steps and once in a while flush the results to the main thread. A few years back I was trying to get perf boost using multithreading and finely grained queued connections like yours, and the multithreaded solution turned out 5 times slower than the single threaded, because of the overhead of frequent queued connections. The correct approach was to accumulate results into the dedicated threads and merge once all threads are done, then I got perfect scaling for the extra threads. – dtech Jan 04 '16 at 00:04
  • I'm still not be able to see an efficient way to achieve what I want. The buffer will increase in size, since I don't know how many items to add in each step (it really depends on how fast your processor is, I think). I could check the main buffer every 50 ms, add a certain maximum number of QLines (i.e. maximum 10), remove them from the buffer and check the number of items I have written so far (an index *i*). I also need to check the secondary buffer, because there could be old lines to remove: – Michael Jan 04 '16 at 00:22
  • if the index *i* is greater than the indexes inside the secondary buffer, I'll proceed with the removal of them along with the update of them. All this, in my humble opinion, will seriously slow down the computational part, since I'll need a mutex on the two buffers. Probably, computations in the main thread with QCoreApplication::processEvents() would be almost cheaper :/ – Michael Jan 04 '16 at 00:22
  • You do not need mutexes. Flooding the event loop will not be cheaper, it will kill your application even if you are not in a computationally heavy scenario. It can literally choke on nothing. – dtech Jan 04 '16 at 00:23
  • You wrote to avoid mutexes but the docs clearly state that *In addition, they are thread-safe (referred to qt containers) in situations where they are used as read-only containers by all threads used to access them.* They say read-only: so if the main thread is trying to clear the buffer and the worker thread is trying to add something in the same moment, what happens? – Michael Jan 04 '16 at 00:31
  • @Michael - the moment the worker attempts to modify the container, it will be detached from its existing storage and will have new storage allocated, so it will not touch the data of the copy you made. They will be two independent containers. That's COW - you cannot change shared data, if you attempt a non const op, that data is first copied, detached and only then the changes take place. Now if you pass by reference - that would be a different thing, as COW won't kick in in that case. – dtech Jan 04 '16 at 00:36
  • Since you call `clear()` after passing the copy, you immediately detach from the shared data, so it is no longer the same container, before the main thread reads it and the worker thread adds to it. So there is no room for a race condition. – dtech Jan 04 '16 at 00:40
  • Yeah, that's the problem, I usually pass by reference, in order to keep low the resources usage – Michael Jan 04 '16 at 00:40
  • It is not a problem, simply *don't pass by reference* - I myself find COW almost entirely useless, but **this** particular use case is perfect for it. With COW in Qt you can pass by value and it is about as efficient as passing by reference. And atomic COW will certainly be much faster than using mutexes, as it is practically lock free. – dtech Jan 04 '16 at 00:42
  • Unfortunately, the structure you suggested me in that long discussion has one big problem. When you add a new node, calling ``scene->addNode(n)``, where `n` is a pointer to a node, it accesses `n` 's children in order to print all the new lines. Since children are stored in a QVector/QList/whatever, a mutex IS needed, otherwise unexpected behaviour occurs, since children are read from main thread and they could be modified in worker thread. – Michael Jan 11 '16 at 12:18
  • @Michael - even if you are in a situation that can result in race conditions - the structure being a tree, you can only lock individual nodes so it doesn't stall all the safe access to other nodes. But it might be an issue of how you prepare the nodes/branches, ideally if possible you should only create a node only after that particular branch is completed, this way all writing takes place before the reading from the gui side. You go up the tree while you create the structure, and draw it as you come down from completed branches. – dtech Jan 11 '16 at 12:43
  • I assume what you are doing right now is "create node, draw node, create children..." - instead you should do "create node, create children..., draw node" - this way you will draw when you come down from an already completed branch, so there is no danger of data being changed while you draw the branch. The same thing can be done for updating nodes as well. – dtech Jan 11 '16 at 12:50
  • Here come problems again: since it's a random algorithm there is no way to predict if that branch will be touched only the first 2 or 3 times or also at the moment of the creation of the 25000th node. So I'll need to use a QMutexLocker on the nodes, partially blocking the creation/update, since I need to use a mutex when I modify the node's edge, when I add/remove a child, etc. – Michael Jan 11 '16 at 12:52
  • @Michael - even if the tree renders the wrong data - it will only be momentary, if you schedule everything in the right order, the visualization of any incorrect data should be updated to correctness almost instantly. Race conditions are really only dangerous when they can break stuff, but in your cause you will just get a momentary glitch in the GUI that will likely not even be noticeable. Take care to push changes in the correct order and build and update the tree in the optimal order and you won't have to lock anything. – dtech Jan 11 '16 at 12:59
  • let me explain better. Since the algorithm is random, it will start from starting node. Then it will create a random node, check the nearest one, check if the nearest one is the cheapest one, and then proceed to add a new node near the cheapest one. So, really, I have no way to know if a branch is complete or not. A branch can stop growing immediately or after thousands of iterations. That's why, I think, I'll need a locker, not because of the possible glitches in the UI, but for problems related to the QList/QVector of children, possibly being resized while being read. – Michael Jan 11 '16 at 13:08
  • OK, even if you have to lock, just lock individual nodes not the whole tree, otherwise your perf will plummet. – dtech Jan 11 '16 at 13:18
  • Ok, I'll revise the structure since I also need to delete the old drawn line before drawing the new one (this is the update case). – Michael Jan 11 '16 at 17:34