I'm trying to debug a Fortran MPI program. When I try to run it with 5 processes, I get a segmentation fault. Oddly enough, if I run the same program with fewer processes this doesn't happen.
When running the program with Valgrind (Memcheck) and analyzing the resulting core files (there are 3 of them) with GDB, I get the following output:
Core was generated by `'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000692a186 in poll () from /lib64/libc.so.6
(gdb) bt
#0 0x000000000692a186 in poll () from /lib64/libc.so.6
#1 0x000000000c47763b in btl_openib_async_thread () from /usr/mpi/intel/openmpi-1.4.3/lib/openmpi/mca_btl_openib.so
#2 0x000000000664a73d in start_thread () from /lib64/libpthread.so.0
#3 0x0000000006932f6d in clone () from /lib64/libc.so.6
And when I run the same program without Valgrind, the core files (there are now 4) return this (with different values for itistep
for each core file):
Core was generated by `/home/me/myprogram.out /home/me/run'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002b04b3fa70f0 in ?? ()
(gdb) bt
#0 0x00002b04b3fa70f0 in ?? ()
#1 <signal handler called>
#2 0x00002b04aea65d4f in opal_memory_ptmalloc2_int_free () from /usr/mpi/intel/openmpi-1.4.3/lib/libopen-pal.so.0
#3 0x00002b04aea6a420 in opal_memory_ptmalloc2_free_hook () from /usr/mpi/intel/openmpi-1.4.3/lib/libopen-pal.so.0
#4 0x00002b04af5477f1 in free () from /lib64/libc.so.6
#5 0x00000000005afecc in for_dealloc_allocatable ()
#6 0x00000000005734a2 in mtd () at mtdk.for:533
#7 0x00000000004ecd76 in opt (itistep=1088970925, inamein=Cannot access memory at address 0x146e0
) at opt.for:494
#8 0x000000000042a074 in cali () at cali.f90:379
#9 0x00000000004183cc in main ()
The line pointed to at #6 (mtdk.for:533
) looks like this:
if (allocated(done2d))DEALLOCATE (done2D)
In this program, done2d
is a 2-dimensional, allocatable real
array that gets allocated in the same subroutine. I don't see anything wrong between the allocation and the deallocate
statement. I recompiled my program after adding status=
to my deallocate
statement, as someone suggested here, but I'm getting the same output.
I'm using Intel Fortran 11.1 with the following flags: -O3 -C -pg -traceback -g -warn interfaces
and running my program on CentOS.
Typing ulimit
or ulimit -s
on the command line returns unlimited
.
I don't know where to look next, does somebody know how to use this information to get to the root of the problem?