htop displays multiple cores in use for one process without use of multithreading

Question

I wrote a C++ program, which does not use multi-threading, but is compiled with either -O2, -O3 or -Ofast flag to optimize speed.

The program is named mir.out and run from the bash script my_script. Run time is determined with:

time bash my_script

The output is

real    18m26.001s
user    56m4.507s
sys     91m14.536s

From this I concluded that multiple cores run in parallel (I have 8 cores) since user and sys time is much larger than real time. Is this correct?

With htop I tested this and find the following

htop with filter output

which indicates that all cores are used by the program. When the program is finished the

htop output

shows only minor use of the cores.

I did not use any function from STL which can be parallelized (see Parallelization in STL) nor did I find any flags hidden in -O2, -O3 or -Ofast, that would automate parallelization (see compiler flgas for optimization).

So my question is, why is my program using multiple cores?

Edit: Thanks to the comments I found the answer:

I used the armadillo library for linear algebra. I thought it needs OpenMPI for multi-threading but some research showed that with BLAS and LAPACK this is already possible.

What makes you think your program runs in parallel on multiple cores? — Ron, Feb 14 '18 at 14:05
@Ron: That the total user/sys are significantly higher than "real" (aka "wall clock time") is a pretty good sign that the process is running multiple threads. A single threaded application will have real >= user + sys (give or take some measurement errors) — Mats Petersson, Feb 14 '18 at 14:10
Of course, since `my_script` isn't shown to us, it's impossible to say WHAT happens inside that script, and what part of it takes what time. — Mats Petersson, Feb 14 '18 at 14:11
@MatsPetersson I see. That explains it for me. Appreciate it. — Ron, Feb 14 '18 at 14:12
It's impossible to answer WHY your application is using multiple threads, without understanding what your code looks like (and plausibly not then either!). Maybe you're calling on a library that splits the work into multiple cores. Generally, the compiler will not do this, but there are plenty of libraries that automatically detect multiple cores and spread the work over these. Compiler options probably don't make any difference here. — Mats Petersson, Feb 14 '18 at 14:18
My first guess would be that you have started the same application multiple times. Are you sure that you are only running one instance of the program? — Johan, Feb 14 '18 at 17:06
Thanks for the comments, it really helped me find the answer. How can I mark this question as answered/remove it (since it has no real value to anyone else)? — Stefan, Feb 15 '18 at 12:37

score 0 · Accepted Answer · answered Feb 19 '18 at 13:27

Thanks for the comments on possible reasons for the multi core use of my program. I found the following to be the cause:

I used Armadillo (linear algebra library for c++) in my program, which parallelizes some of its functions without the use of MPI, but LAPACK and BLAS (as stated in this question).

htop displays multiple cores in use for one process without use of multithreading

1 Answers1