I have implemented a sample MPI application with a Producer and a Consumer. The producer runs on a process with rank 0 and the consumer will be running on all the non-zero ranks. The consumer will be spawning consumer worker threads to process the messages generated by the producer. The consumer threads are split into a receiver thread and worker threads.
The consumer receiver thread executes recv and upon receive passes the message to be consumed by the consumer worker which after performing the computation, sends the processed message back to the Producer(root).
I am running this code on my dual core machine. What I am noticing is that when I execute my application using mpirun -np 2
, the application is performing just fine for any number of generated messages by the producer. When I try running the application with mpirun -np 4
, the application crashes after processing a couple of runs.
Has somebody encountered this issue before? It would be great to get some insight into why this might be happening.
Edit: Here's the exception that I get everytime I run my run my application:
*** glibc detected *** application: free(): invalid pointer: 0x00007f67d1f9f9e0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7e626)[0x7f67d0671626]
/usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x9041)[0x7f67cc790041]
/usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x5a00)[0x7f67cc78ca00]
/usr/lib/libmpi.so.0(MPI_Recv+0x154)[0x7f67d1d531e4]
/usr/local/lib/libboost_mpi.so.1.50.0(_ZN5boost3mpi6detail19packed_archive_recvEP19ompi_communicator_tiiRNS0_15packed_iarchiveER20ompi_status_public_t+0x33)[0x7f67d1fcb223]
/usr/local/lib/libboost_mpi.so.1.50.0(_ZNK5boost3mpi12communicator4recvINS0_15packed_iarchiveEEENS0_6statusEiiRT_+0x45)[0x7f67d1fc4755]
application(_ZNK5boost3mpi12communicator9recv_implI7MessageEENS0_6statusEiiRT_N4mpl_5bool_ILb0EEE+0x74)[0x464d98]
application(_ZNK5boost3mpi12communicator4recvI7MessageEENS0_6statusEiiRT_+0x3b)[0x46479b]
application(_ZN12WorkerReceiver3runEv+0xac)[0x46b1da]
/usr/local/lib/libPocoFoundation.so.12(_ZN4Poco10ThreadImpl13runnableEntryEPv+0x96)[0x7f67d26fcb16]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f67d09b7e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f67d06e54bd]
Thanks