Performance issues with multithreading and stdout

Question

I'm working on a benchmark application which uses an user-defined number of threads to do the processing. I'm also working on a visualizer application for the benchmark results.

The benchmark itself is written in C++ (and uses pthreads for the threading) while the visualizer is written in Python.

Right now, what I'm doing to make the two talk is piping stdout from the benchmark to the visualizer. This has the advantage of being able to use a tool like netcat to run the benchmark on one machine and the visualizer on another.

High level diagram of the application

A bit about the benchmark:

It is very CPU bound
Each thread writes important data (i.e. data that I need for the visualizer) every few 10's of milliseconds.
Each datum printed is a line of 5 to 20 characters.
As stated previously, the number of threads is highly variable (can be 1, 2, 40, etc.)
Even though it is important that the data isn't mangled (e.g. that one thread preempts another during a printf/cout, causing the printed data to be interleaved with the output on another thread), it's not very important that the writes are done in the correct order.

Example regarding the last point:

// Thread 1 prints "I'm one\n" at the 3 seconds mark
// thread 2 prints "I'm two\n" at the 4 seconds mark

// This is fine
I'm two
I'm one

// This is not
I'm I'm one
 two

On the benchmark, I switched from std::cout to printf due to it being closer to a write (2) in order to minimize the chance of interleaving between outputs of different threads.

I'm worried that writing to stdout from multiple threads will be cause for a bottleneck as the number of threads increases. It is quite important that the output-for-visualization part of the benchmark is extremely light on resources so as not to skew the results.

I'm looking for ideas on an efficient way of making my two applications talk without impacting the performance of my benchmark more than the absolute essential. Any ideas? Any of you tackled problems like this before? Any smarter/cleaner solutions?

stdout to a console can well be a bottleneck. I ran into a similar problem and got [this](http://stackoverflow.com/q/11558540/1504523) nice answer. Consider a pipe service for the output. Clients are posting messages into a named pipe and a pipe server does the output. So there is only one thread doing the output. — Arno, Aug 29 '12 at 17:13
On Linux the underlying `write(2)` system call (called by printf, cout, fwrite, etc) is atomic. That is if two different threads call write unsynchronized on the same fd, the output will never be interleaved. That doesn't guarantee that printf and so on do not call write multiple times. Consider using `write(2)` directly if performance is critical. — Andrew Tomazos, Aug 29 '12 at 17:34
Another thought is (if the test is not memory bound, and you are not interested in realtime results) why not write your test results directly to process memory while the test is running and then copy them out at your leisure once the test is complete. — Andrew Tomazos, Aug 29 '12 at 17:37
@AndrewTomazos-Fathomling - Thank you very much for your thoughts. I won't be able to apply your last suggestion since I need the realtime results. — F. P., Aug 29 '12 at 20:45
@FranciscoP.: Ok so it sounds like you are worried that `write(2)` will block and cause unwanted synchronization between threads. In that case I would suggest that your threads put their output timestamped into an in-process blocking queue, and then you have a single thread taking from that queue and calling `write(2)`. At worst the queue will backup in memory and you will have to wait for your results - but your python script should look at the timestamp and not the time it received the message. — Andrew Tomazos, Aug 29 '12 at 21:55
Also see [Is cout synchronized/thread-safe?](https://stackoverflow.com/q/6374264/608639) — jww, May 27 '18 at 11:35

score 3 · Answer 1 · answered Aug 29 '12 at 17:08

3

Writing to stdout is very unlikely to be a performance bottleneck for any real world problem. If it is, you are either logging too much or benchmarking a task which is so fast as to be unmeasurable against the background noise. It is, however, a threadsafety bug. Your choice of printf vs. cout is just voodoo -- neither are threadsafe. If you want to use buffered I/O in a multithreaded environment you need to serialize the calls yourself (using a pthread_mutex_t, or implement a queue with a semaphore, etc...). If you want to rely on system call atomicity to do this for you (internally, the kernel does exactly the same kind of serialization), you need to make the system call yourself and not rely on printf being "close to" write.

answered Aug 29 '12 at 17:08

Andy Ross

11,699
1
34
31

Thanks for your answer. I was aware of the thread-unsafety of printf/cout, "minimizing" was the key word. (I'm aware that this is highly implementation dependent) – F. P. Aug 29 '12 at 17:19
1

A simple solution would be to create a `streambuf` which was initialized with an fd, and never wrote except when explicitly flushed. Then use one per thread, always with the same fd. The call to `write` is thread safe and atomic. – James Kanze Aug 29 '12 at 17:20
Why let it be implementation dependent? If you are writing in c++, you have stl. If you have stl, you have queue. And you're already using pthreads. There are ways to make it thread safe and implementation independent. – John Watts Aug 29 '12 at 17:38
@AndrewTomazos You are right. I must have been thinking of Java's BlockingQueue which is, of course, a different beast. Embarassing. – John Watts Aug 31 '12 at 11:07

score 0 · Answer 2 · answered Aug 29 '12 at 17:08

0

They could all push their lines of output as strings to a queue while another thread could pull them and log them (single-threaded, buffered output, flushing less frequently).

answered Aug 29 '12 at 17:08

John Watts

8,717
1
31
35

The call to `write` should be thread safe (it is under Posix), so if that's all the second thread is doing, you might as well do it in the originating thread. – James Kanze Aug 29 '12 at 17:14

score 0 · Answer 3 · answered Aug 29 '12 at 17:13

First, I'd make sure it was a problem before I worried about it. If the writes are only once every 10 or 20 milliseconds, it's likely that they won't bother anything.

Otherwise: the "write" actually consists of two operations: formatting the output, and physically outputting the formatted bytes. The second is probably fairly fast, since it is only a question of copying 5 to 20 characters from your process into the OS. (The OS will do the physical write once you've returned from the write/WriteFile function.) If you format locally, using std::ostrstream (deprecated, but should be available) or snprintf, formatting into a local char[], then calling write or WriteFile on the results, you don't need any external synchronization.

Alternatively, you can do all of the writing in a separate thread, just pushing requests (with the necessary data) into a queue (which is easily implemented using conditional variables).

score 0 · Answer 4 · answered Aug 29 '12 at 17:24

Assuming you have a POSIX compliant stdlib, each call to a stdio function is atomic with respect to other threads, so as long as you print out your lines with a single printf call, they won't get mixed together even if two threads write a line at the exact same time. The same is true for each iostream::operator<< call with C++, but if you write something like cout << "xxx " << var << endl;, that's three calls, not one.

If you want to use several calls to a stdio function and have it written as a single unit, you can use flockfile(3). For example:

flockfile(stdout);
printf("data: ");
print_struct(foo);  // a function that calls printf internally
printf("\n");
funlockfile(stdout);

This will cause the whole thing from data to the newline to be printed without allowing other threads to interleave stuff. Its also useful with C++ iostreams:

flockfile(stdout);
cout << "data: " << x << endl;
funlockfile(stdout);

Performance issues with multithreading and stdout

4 Answers4