0

I have developed a Fortran code with memory requirements that scale with the size of the problem compiled with ifort. After initialization of the problem (allocation of arrays, etc.) the main part of the code loops through a series of function calls.

One of these functions includes 3-5 call system() commands. Some are simple and are only copying directories such as:

call system('cp -r plot_files plot_files1)

While there is another which actually calls an mpiexec that runs a separate program.

The problem is that the program 'hangs' on the system calls about 50% of the time but only for large problems in which there have been arrays allocated (~ array(300000)).

By hang I mean that when I qstat, it is still shown to be running but searching for the PID using pstack, strace, cat/proc/PID/status reveals that the PID no longer exists.

There are call system() earlier in the code before the bulk of the initialization and it will never fail there, only after allocation of the arrays. This made me believe that it was a memory issue, but monitoring during the hanging process reveals that there is plenty of memory available.

[top -cbp PID before the program hangs][1]

I was originally compiling it with OpenMP with the hopes of future parallelization of the code. With OpenMP the failure rate was around 80% of the time. When OpenMP was taken out the failure rate fell to around 50%. I've searched and searched for possible reasons for this problem and have come up empty handed.

  • Welcome. Use tag [tag:fortran]. You can append another tag for a specific version if necessary, bu your question is not version specific. System() is not standard Fortran at all. – Vladimir F Героям слава Dec 20 '16 at 22:50
  • I think your code will be necessary. See http://stackoverflow.com/help/how-to-ask an http://stackoverflow.com/help/mcve – Vladimir F Героям слава Dec 20 '16 at 22:51
  • I read through it before posting, but unfortunately I can't post code. This is for a company. Whatever is causing the problem it is not a function specific problem. It can happen in multiple areas of the code. It specifically happens after the majority of memory has been allocated and only on system calls, it does not matter what the system call is. It can be as simple as copying a text file, changing directories, copying directories or calling an mpiexec. Any ideas would appreciated, I apologize that I can not give any specific code. – jamber86 Dec 21 '16 at 14:13
  • Also, I've tried work arounds for the system call. I understand its not standard fortran, but ifort does not support the execute_command_line which I would much rather use. I' – jamber86 Dec 21 '16 at 14:13
  • 1
    Execute_command_line works in the same way, I doubt it would make a difference in this problem. – Vladimir F Героям слава Dec 21 '16 at 14:20
  • [I guess it may be useful to post a question in the Intel forum... (btw, attaching -heap-arrays does not change anything?)] – roygvib Dec 23 '16 at 02:05
  • Since this is an MPI program you may want to check out. http://stackoverflow.com/questions/10627045/random-failure-of-mpi-fortran-code – user1139069 Dec 25 '16 at 17:46

0 Answers0