I wrote a C++ program, which does not use multi-threading, but is compiled with either -O2
, -O3
or -Ofast flag to optimize speed.
The program is named mir.out
and run from the bash script my_script
. Run time is determined with:
time bash my_script
The output is
real 18m26.001s
user 56m4.507s
sys 91m14.536s
From this I concluded that multiple cores run in parallel (I have 8 cores) since user and sys time is much larger than real time. Is this correct?
With htop I tested this and find the following
which indicates that all cores are used by the program. When the program is finished the
shows only minor use of the cores.
I did not use any function from STL which can be parallelized (see Parallelization in STL) nor did I find any flags hidden in -O2, -O3 or -Ofast, that would automate parallelization (see compiler flgas for optimization).
So my question is, why is my program using multiple cores?
Edit: Thanks to the comments I found the answer:
I used the armadillo library for linear algebra. I thought it needs OpenMPI for multi-threading but some research showed that with BLAS and LAPACK this is already possible.