1

In a standard setting where there is one module running with multiple threads, we can time the program using real time (a.k.a. wall-clock time) and thread time (total time spent in all threads used by the module). If the real time is low, then we have no problems. The program finished quickly and there's no need to optimize it. However, if the real time is high, we want to lower it, but we don't know what makes the program slow: the efficiency of the algorithm or the parallelization. Now, we can use the thread time to see what the time is used on. If the thread time is low, the parallelization needs to be optimized. If the thread time is high, the algorithm needs to be optimized.

Now, this is well-known and has already been said to some extend on What do 'real', 'user' and 'sys' mean in the output of time(1)?

We run our program in a different setting. We have a huge amount of data, so we need to save and load data from the disk often because we can't keep it all in memory at the same time. To avoid IO as much as possible, we stream one data point at a time through several modules at the same time. To clarify by an example: We have two modules A and B, and some data D. The data is a collection of data points d1, d2, ... . Our pipeline is then defined as:

disk -> d1 -> A -> d1' -> B -> d1'' -> disk
disk -> d2 -> A -> d2' -> B -> d2'' -> disk

and so on.

Now, to add an extra layer, we found that module B was slow, so we parallelized it, and it's super effective. ... if it wasn't for the fact that we can no longer rely on our measurements of real time. Before, we had a timer for each module that started before computing a given data point and suspended it afterwards. Now, we measure the real time of A and B while they run at the same time.

QUESTION

Does there exist a way to measure time for a streamed, parallelized system that makes it possible to reason about where to optimize, and whether to focus on the efficiency of the algorithm or the parallelization?

Little Helper
  • 1,870
  • 3
  • 12
  • 20
  • 1
    How was module B parallelized? In which language was the module written? I'm not a unix expert, but I've often read about Valgrind. http://valgrind.org/ as the best tool to profile something. – user743414 Aug 30 '18 at 12:52

1 Answers1

0

While pipelines add much value, a bug issue with them is identificatin and remediation of pipeline stalls. Out of many reasons for this, one is different speeds of individual stages. For example (say) the first stage runs faster and produces data every second, but if the second stage is slow and cannot consume data every second, then either a queue will buildup at the stage junction or first stage will stop/stall until the second stage is done processing previous data.

Depending on implimentation the detection can be done by monitoring either the interface queue or the idle/wait time of the stages. The remedy is almost always to have multiple concurrent stages of the slower type. Another solution is to actually split the slow stage to two sequential but faster stages.

inquisitive
  • 3,549
  • 2
  • 21
  • 47