this is a little bit general question,
I have a segfault in a multithreaded program, and bt
coredump shows below,
(gdb) bt full
#0 0x0000000000441540 in try_dequeue<std::shared_ptr<Frame> > (item=<synthetic pointer>, this=0xbe3c50) at /root/projects/active/user/include/third_party/concurrentqueue.h:1111
nonEmptyCount = 0
best = 0x0
bestSize = 0
#1 ConsumerNice::listening_nice (this=0xbe3c40) at /root/projects/active/user/include/concurrency/consumer_nice.h:45
frame = std::shared_ptr (empty) 0x0
#2 0x00000000004c0530 in execute_native_thread_routine ()
No symbol table info available.
#3 0x00007f3eb3f81e65 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007f3ead70a88d in clone () from /lib64/libc.so.6
No symbol table info available.
So I go to look at the source code, my code as below
void listening_nice() {
while (true) {
std::shared_ptr<Frame> frame;
if (nice_queue.try_dequeue(frame)) {
on_frame_nice(frame);
}
}
}
and cameron314/concurrentqueue
part look like below,
bool try_dequeue(U& item)
{
// Instead of simply trying each producer in turn (which could cause needless contention on the first
// producer), we score them heuristically.
size_t nonEmptyCount = 0;
ProducerBase* best = nullptr;
size_t bestSize = 0;
for (auto ptr = producerListTail.load(std::memory_order_acquire); nonEmptyCount < 3 && ptr != nullptr; ptr = ptr->next_prod()) {
auto size = ptr->size_approx();
if (size > 0) {
if (size > bestSize) {
bestSize = size;
best = ptr;
}
++nonEmptyCount;
}
}
It doesnt seem possible to cause segfault, therefore I am wondering, is bt
always show the culprit thread? or there is a chance segfault is caused by some other problem in some other thread, or even the operating system?
Noted this program is running on 3 same configured machine, but only one machine crashes once a day, that is it runs for 3 straight hours on that one machine, then crashed.