4

The process running the following code crashes with a Segmentation fault:

#include <stdlib.h>
#include <iostream>
#include <pthread.h>

void* f( void* )
{
   while( true )
   {
      // It crashes inside this call (with cerr, too).
      std::cout << 0;
   }
   return NULL;
}

int main()
{
   pthread_t t;
   pthread_create( &t, NULL, &f, NULL );

   while( true )
   {
      // It crashes with any script/app; true is just simple.
      system( "true" );
   }
   return 0;
}

It crashes about every other execution within a few seconds (output has anywhere from thousands to millions of '0's). It crashes a few functions deep in the cout << 0 call with the above code. Depending on extra functions called or data put on the stack in f(), it crashes in different places. In gdb, sometimes the stack doesn't make sense with regard to the order of the function calls. From this I deduce the stack is corrupted.

I found there are some problems with multi-threaded applications calling fork() (see also two of the comments mentioning stack corruption). Forking/cloning a process copies the file descriptors if they aren't set to FD_CLOEXEC. However, there are no explicitly created file descriptors. (I tried setting FD_CLOEXEC on fileno( stdout ) and fileno( stderr ) with no positive change.)

Even without explicit file descriptors can I not mix threads and fork()? Do I simply need to replace the system() call with equivalent functionality? Or is there a bug in the kernel that causes this crash and has been fixed after 2.6.30?

Other Details

I am running it on an ARM AT91 processor (armv5tejl) with Linux 2.6.30 (with some overlays and patches for my specific set of peripherals) compiled with GCC 4.3.2.

Linux 2.6.30 #1 Thu May 29 15:43:04 CDT 2014 armv5tejl GNU/Linux

I had been [cross] compiling it with -g and -O0, but without those it still crashes:

arm-atmel-linux-gnueabi-g++ -o system_thread system_thread.cpp -lpthread

I've also tried the -fstack-protector-all flag: Sometimes it crashes in __stack_chk_fail(), but sometimes other function pointers or data get corrupted and it crashes earlier.

The libraries it loads (from strace):

libpthread.so.0
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6

Note: Since it sometimes does not crash and is not really responsive to ^C, I typically run it in the background:

$ killall -9 system_thread; rm -f log; system_thread >log &

I have compiled this program for a few different architectures and Linux kernel versions, but I have not seen it crash anywhere else:

Linux 3.10.29 #1 Wed Feb 12 17:12:39 CST 2014 armv5tejl GNU/Linux
Linux 3.6.0-dirty #3 Wed May 28 13:53:56 CDT 2014 microblaze GNU/Linux
Linux 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014 x86_64 x86_64 GNU/Linux
Linux 3.8.0-35-generic #50~precise1-Ubuntu SMP Wed Dec 4 17:25:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

EDIT: Note that on the same architecture (armv5tejl) it does not crash with Linux 3.10.29. Also, it does not crash when running on an earlier version of my "appliance" (older server and client applications), having the same version of Linux - 2.6.30. So the environment of the OS has some effect.

BusyBox v1.20.1 provides sh that system() calls.

Joel Ostraat
  • 43
  • 1
  • 5
  • [The `system` function is not thread-safe](http://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html), but I'm not sure that's your problem. – R.. GitHub STOP HELPING ICE May 31 '14 at 04:31
  • I think since you are using infinite loops that is why it may execute until the stack is full and then it will corrupt, try not using infinite loop. – smali May 31 '14 at 04:31
  • BTW I can't reproduce the crash. – R.. GitHub STOP HELPING ICE May 31 '14 at 04:34
  • infinite loops normally caused the entire system to become unresponsive. With the now-prevalent preemptive multitasking model, infinite loops usually cause the program to consume all available processor time, but can usually be terminated by the user – smali May 31 '14 at 04:35
  • @ali786 - These infinite loops have been simplified from my regular program, which is a server that continually processes messages from clients. Normally the loops are waiting on a socket receive or a semaphore. This is the simplest code that still crashes. – Joel Ostraat May 31 '14 at 11:58
  • @R.. - I'm not surprised you couldn't reproduce the crash. I've also tried it on half a dozen different architectures and it only crashes on one of them. – Joel Ostraat May 31 '14 at 12:03
  • Could be a gcc bug on that architecture. Also try `-pthread` instead of `-lpthread` for both compilation and linking. – n. m. could be an AI May 31 '14 at 12:27
  • @n.m. - It still crashes when compiling and linking with `-pthread`. – Joel Ostraat May 31 '14 at 12:50
  • @n.m. - GCC could have a bug on that architecture. However, it doesn't always crash on that architecture. See my edit above. – Joel Ostraat May 31 '14 at 12:52
  • It is also possible that you have a hardware problem that only manifests itself with a specific version of Linux. Your program by itself is OK. – n. m. could be an AI May 31 '14 at 14:13
  • Are you able to reproduce this issue under an emulator (e.g, qemu)? An intermittent crash after several seconds smells to me like it could be a hardware problem. –  Jun 01 '14 at 03:39

1 Answers1

2

This is reproducible on an ARM processor using the 2.6.30 kernel that you mentioned, but not in master. We can use git bisect to find where this bug was fixed (it took about 16 iterations). Note that, since git bisect is meant to find regressions, but in this case master is "good" but a past version is "bad," we need to reverse the meanings of "good" and "bad".

The culprit found by the bisection is this commit, to fix "an instance of userspace data corruption" involving fork(). This symptom is very similar to the symptom you describe, and could also corrupt memory outside of the stack. After backporting this commit and the required parent to the 2.6.30 kernel, the code you posted no longer crashes.

Community
  • 1
  • 1
isotherm
  • 36
  • 2