0

I have written a parallel program using OpenMP. Since there are 8 cores in my machine, I spawn 8 threads. Using the command "sar -p ALL 1 20", I could see that the I/O wait percentage for all the cores is very high.

Based on another SO post, I found that callgrind is a good tool to profile C++ applications, but it does not work for my code. I am using OpenBLAS,and valgrind complains that it is unable to recognize OpenBLAS functions.

Can someone please tell me how I could track down exactly where in my code the problem lies.

user1274878
  • 1,275
  • 4
  • 25
  • 56
  • What operating system is your machine using? – Thomas Matthews Apr 11 '14 at 21:40
  • Linux. Kernel versio: 3.5.0 – user1274878 Apr 11 '14 at 22:11
  • Kinda pointless, you are not going to rewrite the operating system to make the I/O faster. So just benchmark it, try different number of threads and see which one wins. Pretty high odds that the optimal number of threads is 1, always good pause for "hmm, I'm doing it wrong". Keep I/O on one thread, process it on N threads. Good odds at arriving at 2 that way. Getting beyond 2, that requires knowing what actually is going on. We are not there yet from this question. – Hans Passant Apr 11 '14 at 22:34
  • Well, I have found that 8 threads is actually the optimal. – user1274878 Apr 12 '14 at 01:21
  • Run it under GDB and type Control-C. Do `thread n` and `bt` to examine each thread's stack. If you halted it during I/O, you will see why it's doing it. If you didn't, do it again until you do. That's [*this method*](http://stackoverflow.com/a/378024/23771). – Mike Dunlavey Apr 12 '14 at 01:46

0 Answers0