1

I noticed that in my program linking with opencv spoiled the concurrent execution of threads. In my debugging endeavours, I found that even following a standard c++ thread-example, which doesn't use opencv, shows the same problem. In the code example below three threads are launched.

Perhaps somebody can explain the behaviour below and suggest what to do to have threads running concurrently in my openCV-enabed application.

I compile and link the sample code below in two ways, one without the opencv library ('mttok') and one called 'mttno', with opencv linked in (although it doesn't need it!)

g++ -o mttok -O3 main.cpp -lpthread 
c++ -o mttno -O3 main.cpp -lpthread -L/usr/local/opencv/lib64 -lopencv_core 

When I run 'mttok' all is as expected. The three threads run concurrently (as can be verified from a CPU-activity monitor, but it also can be deduced from the output of the linux time command):

==> time mttok
All threads are running...
result1: 5e+19
result2: 5e+19
result3: 5e+19
27.072u 0.000s 0:09.04 299.4%   0+0k 0+0io 0pf+0w

However, when I run the executable with opencv linked-in, the wall-time is about three times as high. The cpu-monitor shows only one 'cpu' at work.

==> time mttno
All threads are running...
result1: 5e+19
result2: 5e+19
result3: 5e+19
26.829u 0.163s 0:26.53 101.6%   0+0k 0+0io 0pf+0w

I do not understand this behaviour. What can be done? Thanks in advance. Bertwim

Sample code follows now.

#include <iostream>
#include <thread>   

const unsigned long NMAX=10000000000;

class MTTest
{
public:
   void foo( double& r )
   {
      double s = 0;
      for (unsigned long u=0; u<NMAX; u++)
      {
         s += u;
      }
      r = s;
   }
};

int main()
{
   double s1, s2, s3;

   std::unique_ptr<MTTest> ptr1( new MTTest );
   std::unique_ptr<MTTest> ptr2( new MTTest );
   std::unique_ptr<MTTest> ptr3( new MTTest );

   std::thread t1( &MTTest::foo, ptr1.get(), std::ref(s1) );
   std::thread t2( &MTTest::foo, ptr2.get(), std::ref(s2) );
   std::thread t3( &MTTest::foo, ptr3.get(), std::ref(s3) );

   std::cout << "All threads are running..." << std::endl;

   // synchronize threads:
   t1.join();
   t2.join();
   t3.join();

   std::cout << "result1: " << s1 << std::endl;
   std::cout << "result2: " << s2 << std::endl;
   std::cout << "result3: " << s3 << std::endl;

   return 0;
}
// ---- eof ---

New info: strace output

As suggested I have compared the output of the strace command. This may give a lead, as there are some notable differences. Most differences are just addresses, which I have ignored. But 2 DIFF's stand out, which I have extracted from the output. Below follows first a part of the "not-ok"-version, followed by the corresponding output of the "ok"-version. The DIFF's have been indicated. The "not-ok"-version has more futex-calls, partly with different outcomes, and it shows something about the cpu's. The ok-output does not show this type of info. Unfortunately, I cannot interpret this.

brk(NULL)                               = 0x1edc000
brk(0x1efd000)                          = 0x1efd000
++++ BEGIN DIFF 1 +++
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 4), ...}) = 0
fstat(1, {st_mode=S_IFREG|0664, st_size=23324, ...}) = 0
fstat(2, {st_mode=S_IFREG|0664, st_size=23381, ...}) = 0
sched_getaffinity(0, 512, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) = 64
shmget(IPC_PRIVATE, 4096, IPC_CREAT|0666) = 3932170
shmat(3932170, NULL, 0)                 = 0x7f3fbb62b000
shmctl(3932170, IPC_RMID, NULL)         = 0
shmget(0x510510, 4096, 0666)            = -1 ENOENT (No such file or directory)
shmget(0x510510, 4096, IPC_CREAT|0666)  = 3964939
shmat(3964939, NULL, 0)                 = 0x7f3fbb62a000
open("/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents(3, /* 30 entries */, 32768)    = 864
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
sched_getaffinity(0, 8, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) = 8
open("/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents(3, /* 10 entries */, 32768)    = 312
openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) = 4
read(4, "fff\n", 160)                   = 4
close(4)                                = 0
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "041\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "082\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "104\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "208\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu4/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "410\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu5/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "820\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu6/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "041\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu7/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "082\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "104\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "208\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu10/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "410\n", 160)                   = 4
close(3)                                = 0
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu11/cache/index1/shared_cpu_map", O_RDONLY) = 3
read(3, "820\n", 160)                   = 4
close(3)                                = 0
sched_setaffinity(0, 128, [0])          = 0
set_mempolicy(MPOL_INTERLEAVE, [0x0000000000000001], 64) = 0
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fb5899000
mprotect(0x7f3fb589a000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fb6098fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fb60999d0, tls=0x7f3fb6099700, child_tidptr=0x7f3fb60999d0) = 19550
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fb3098000
mprotect(0x7f3fb3099000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fb3897fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fb38989d0, tls=0x7f3fb3898700, child_tidptr=0x7f3fb38989d0) = 19551
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fb0897000
mprotect(0x7f3fb0898000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fb1096fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fb10979d0, tls=0x7f3fb1097700, child_tidptr=0x7f3fb10979d0) = 19552
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fae096000
mprotect(0x7f3fae097000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fae895fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fae8969d0, tls=0x7f3fae896700, child_tidptr=0x7f3fae8969d0) = 19553
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fab895000
mprotect(0x7f3fab896000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fac094fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fac0959d0, tls=0x7f3fac095700, child_tidptr=0x7f3fac0959d0) = 19554
==== END DIFF 1 ===
futex(0x7f3fb9f0305c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f3fb9f03068, FUTEX_WAKE_PRIVATE, 2147483647) = 0
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fa9094000
mprotect(0x7f3fa9095000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fa9893fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fa98949d0, tls=0x7f3fa9894700, child_tidptr=0x7f3fa98949d0) = 19555
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fa8893000
mprotect(0x7f3fa8894000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fa9092fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fa90939d0, tls=0x7f3fa9093700, child_tidptr=0x7f3fa90939d0) = 19556
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f3fa8092000
mprotect(0x7f3fa8093000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f3fa8891fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f3fa88929d0, tls=0x7f3fa8892700, child_tidptr=0x7f3fa88929d0) = 19557
fstat(1, {st_mode=S_IFREG|0664, st_size=30221, ...}) = 0
write(1, "All threads are running...\n", 27All threads are running...
) = 27
++++ BEGIN DIFF 2 +++
futex(0x7f3fa98949d0, FUTEX_WAIT, 19555, NULL) = 0
futex(0x7f3fa90939d0, FUTEX_WAIT, 19556, NULL) = 0
futex(0x7f3fa88929d0, FUTEX_WAIT, 19557, NULL) = 0
write(1, "result1: 5e+19\n", 15result1: 5e+19
)        = 15
write(1, "result2: 5e+19\n", 15result2: 5e+19
)        = 15
write(1, "result3: 5e+19\n", 15result3: 5e+19
)        = 15
futex(0x7f3fb8bfad68, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f3fb8bfade8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f3fb8bfae68, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f3fb8bfae18, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f3fb8bfaee8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f3fb8bfaf68, FUTEX_WAKE_PRIVATE, 1) = 1
munmap(0x7f3fa9094000, 8392704)         = 0
munmap(0x7f3fa8893000, 8392704)         = 0
munmap(0x7f3fa8092000, 8392704)         = 0
munmap(0x7f3fb5899000, 8392704)         = 0
munmap(0x7f3fb3899000, 33554432)        = 0
munmap(0x7f3fb1098000, 33554432)        = 0
munmap(0x7f3fae897000, 33554432)        = 0
munmap(0x7f3fac096000, 33554432)        = 0
munmap(0x7f3fa9895000, 33554432)        = 0
shmctl(3964939, IPC_STAT, {shm_perm={uid=1000, gid=100, mode=0666, key=5309712, cuid=1000, cgid=100}, shm_segsz=4096, shm_cpid=19549, shm_lpid=19549, shm_nattch=1, shm_atime=1572184421, shm_dtime=0, shm_ctime=1572184421}) = 0
shmctl(3964939, IPC_RMID, NULL)         = 0
shmdt(0x7f3fbb62a000)                   = 0
shmdt(0x7f3fbb62b000)                   = 0
==== END DIFF 2 ===
exit_group(0)                           = ?
+++ exited with 0 +++

For the ok-ituation, 'strace ,.mttok' yields:

==> strace ./mttok
brk(NULL)                               = 0x1737000
brk(0x1758000)                          = 0x1758000
++++ DIFF 1 +++
==== DIFF 1 ===
futex(0x7ff40b14f05c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7ff40b14f068, FUTEX_WAKE_PRIVATE, 2147483647) = 0
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7ff409cbc000
mprotect(0x7ff409cbd000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7ff40a4bbfb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ff40a4bc9d0, tls=0x7ff40a4bc700, child_tidptr=0x7ff40a4bc9d0) = 19541
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7ff4094bb000
mprotect(0x7ff4094bc000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7ff409cbafb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ff409cbb9d0, tls=0x7ff409cbb700, child_tidptr=0x7ff409cbb9d0) = 19542
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7ff408cba000
mprotect(0x7ff408cbb000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7ff4094b9fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ff4094ba9d0, tls=0x7ff4094ba700, child_tidptr=0x7ff4094ba9d0) = 19543
fstat(1, {st_mode=S_IFREG|0664, st_size=14307, ...}) = 0
write(1, "All threads are running...\n", 27All threads are running...
) = 27
++++ DIFF 2 +++
futex(0x7ff40a4bc9d0, FUTEX_WAIT, 19541, NULL) = 0
write(1, "result1: 5e+19\n", 15result1: 5e+19
)        = 15
write(1, "result2: 5e+19\n", 15result2: 5e+19
)        = 15
write(1, "result3: 5e+19\n", 15result3: 5e+19
)        = 15
==== DIFF 2 ===
exit_group(0)                           = ?
+++ exited with 0 +++

New info2: strace -f

When the flag -f is added to the strace command, the additional output is overwhelming. In particular for the 'not-ok'-case, there are many *thousands of lines looking like this:

[pid  3836] mprotect(0x7f2b77446000, 8388608, PROT_READ|PROT_WRITE <unfinished ...>
[pid  3841] sched_yield( <unfinished ...>
[pid  3836] <... mprotect resumed> )    = 0
[pid  3841] <... sched_yield resumed> ) = 0
[pid  3836] clone( <unfinished ...>
[pid  3841] sched_yield( <unfinished ...>
[pid  3840] <... sched_yield resumed> ) = 0
[pid  3841] <... sched_yield resumed> ) = 0
[pid  3836] <... clone resumed> child_stack=0x7f2b77c44fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2b77c459d0, tls=0x7f2b77c45700, child_tidptr=0x7f2b77c459d0) = 3844
[pid  3841] sched_yield( <unfinished ...>
[pid  3840] sched_yield( <unfinished ...>
[pid  3836] fstat(1,  <unfinished ...>
[pid  3841] <... sched_yield resumed> ) = 0
[pid  3836] <... fstat resumed> {st_mode=S_IFREG|0664, st_size=37237, ...}) = 0
[pid  3841] sched_yield( <unfinished ...>
[pid  3840] <... sched_yield resumed> ) = 0
[pid  3836] write(1, "All threads are running...\n", 27All threads are running...
 <unfinished ...>
[pid  3841] <... sched_yield resumed> ) = 0
[pid  3836] <... write resumed> )       = 27
[pid  3841] sched_yield( <unfinished ...>
[pid  3836] futex(0x7f2b78c479d0, FUTEX_WAIT, 3842, NULL <unfinished ...>
[pid  3841] <... sched_yield resumed> ) = 0
[pid  3840] sched_yield( <unfinished ...>
[pid  3841] sched_yield( <unfinished ...>
[pid  3840] <... sched_yield resumed> ) = 0
[pid  3841] <... sched_yield resumed> ) = 0
[pid  3840] sched_yield( <unfinished ...>
[pid  3840] <... sched_yield resumed> ) = 0
[pid  3841] <... sched_yield resumed> ) = 0
[pid  3840] sched_yield( <unfinished ...>
--- 
and many thousands lines like these to follow...
Bertwim van Beest
  • 1,311
  • 2
  • 9
  • 7
  • Why use 2 different compilers? – Mark Setchell Oct 24 '19 at 15:10
  • Ah, that is just a typo. In both cases it is g++. The observation remains the same (I checked). – Bertwim van Beest Oct 24 '19 at 16:14
  • Can you compare the output of `strace` in each case? – Davis Herring Oct 26 '19 at 04:55
  • Davis: new info added (strace command). This may give a lead? – Bertwim van Beest Oct 27 '19 at 15:24
  • @BertwimvanBeest: You have to use “@“ to address a reply. The trace shows that OpenCV is starting several additional threads; you may need to pass `-f` to `strace` to get (a lot of) output from the threads. – Davis Herring Oct 27 '19 at 20:53
  • @Davis Herring: I have added some from the produced by the strace -f command. Some additional 17000 lines with suspect output is the result... Not sure what they mean, though. – Bertwim van Beest Oct 28 '19 at 21:23
  • @BertwimvanBeest: Well, [`sched_yield`](http://man7.org/linux/man-pages/man2/sched_yield.2.html) is a special-purpose tuning function that most applications shouldn’t be calling. OpenCV has done [strange things with it](https://stackoverflow.com/q/30248295/8586227) before, perhaps because it expects to control all threads in the process with TBB. Apparently it can be built without such support/behavior. – Davis Herring Oct 29 '19 at 02:14
  • @Davis Herring. At least this is a lead, something I can dive into. THanks for pointing me to this. – Bertwim van Beest Oct 29 '19 at 06:40

0 Answers0