7

Using identical source files for a Fortran .dll I can compile them with Compaq Visual Fortran 6.6C or Intel Visual Fortran 12.1.3.300 (IA-32). The problem is that the execution fails on the Intel binary, but works well with Compaq. I am compiling 32-bit on a Windows 7 64-bit system. The .dll calling driver is written in C#.

The failure message comes from the dreaded _chkstk() call when an internal subroutine is called (called from the .dll entry routine). (SO answer on chkstk())

The procedure in question is declared as (pardon the fixed file format)

  SUBROUTINE SRF(den, crpm, icrpm, inose, qeff, rev,  
 &               qqmax, lvtyp1, lvtyp2, avespd, fridry, luin,  
 &               luout, lurtpo, ludiag, ndiag, n, nzdepth, 
 &               unit, unito, ier)

  INTEGER*4 lvtyp1, lvtyp2, luin, luout, lurtpo, ludiag, ndiag, n, 
 &          ncp, inose, icrpm, ier, nzdepth
  REAL*8    den, crpm, qeff, rev, qqmax, avespd, fridry           
  CHARACTER*2  unit, unito

and called like this:

      CALL SRF(den, crpm(i), i, inose, qeff(i), rev(i),  
 &             qqmax(i), lvtyp1, lvtyp2, avespd, fridry, 
 &             luin, luout, lurtpo, ludiag, ndiag, n, nzdepth,  
 &             unit, unito, ier)

with similar variable specifications except for crpm, qeff, rev and qqmax are arrays of which only the i-th elements is used for each SRF() call.

I understand possible stack issues if the arguments are more than 8kb in size, but in this case we have 7 x real(64) + 11 x int(32) + 2 x 2 x char(8) = 832 bits only in passed arguments.

I have worked really hard to move arguments (especially arrays) into a module, but I keep getting the same error

error.

The dissasembly from the Intel .dll is

intel

The dissasembly from the Compaq .dll is

compaq

Can anyone offer any suggestions on what is causing the SO, or how to debug it?

PS. I have increased the reserved stack space to hundreds of Mb and the problem persists. I have tried skipping the chkstk() call in the dissasembler but in crashes the program. The stack check starts from address 0x354000 and iterates down to 0x2D2000 where it crashes accessing a guard page. The stack bottom address is 0x282000.

Community
  • 1
  • 1
John Alexiou
  • 28,472
  • 11
  • 77
  • 133
  • What compiler options are you using in each case? Try compiling will all the warning error flags you can think of enabled. As a first step I tend to use `-std -check all -Warn all,nodec,interfaces,declarations -gen_interfaces -g -C -traceback -fpe0 -fp-stack-check` with ifort. – Chris Apr 14 '12 at 12:01
  • I expect you've spotted your own arithmetic mistake by now, but your total of 832 bytes should be 832 bits. It's not always the case, either, that each character is represented by one byte: this tends to vary with compiler and platform. With an up to date compiler the size of the various 'storage units' are available as constants defined in ISO_FORTRAN_ENV intrinsic module. – High Performance Mark Apr 14 '12 at 18:27
  • Both comments above give me something to work with. I will investigate more. It is possible there is a stack corruption that `CVF` is not catching and somehow blows over it. There is a lot going on in the code as many developers have touched it since the 80's when it was first written. – John Alexiou Apr 15 '12 at 19:19
  • As an update none of the tricks and changes I have tried so far have gotten me any closer to a resolution. I have reduced the number and size of parameters down to like 8 scalars and the problem persist. – John Alexiou Jun 03 '12 at 02:30
  • 1
    `_chkstk()` checks for enough stack space for local variables, not for arguments (they are already in the stack). Do you by and chance have large arrays local to the subroutine? Intel Fortran doesn't do heap allocation of local arrays by default. – Hristo Iliev Jun 05 '12 at 14:49
  • Is it possible to post the entire subroutine `SRF`? – mgilson Jun 07 '12 at 13:04
  • I cannot post `SRF` (it contains proprietary information), I will look into local variable storage. BTW how do you turn on heap storage for locals. I tried with `Fortran/Optimization/Heap Arrays=0` option but it made no difference. – John Alexiou Jun 07 '12 at 14:16
  • It may help to determine on which instruction the stack overflows. – Samuel Edwin Ward Jun 08 '12 at 19:49
  • @SamuelEdwinWard: As soon as I step into the function `SRF` it falls into `_chkstk()` and fails. None of the statements inside the function are hit upon. – John Alexiou Jun 09 '12 at 13:38
  • Here is a related post: http://stackoverflow.com/q/12916176/380384 – John Alexiou Dec 20 '13 at 18:34
  • Very long shot here given the question age, but have you found the solution to this? I have **precisely** the very same problem as you described. @hanspassant maybe? This question was ultimaltely left unanswered... – CTZStef May 05 '21 at 14:09
  • @CTZStef - Actually I did. There was a bug in the code beforehand that overfilled the stack. When I switched arrays from stack allocated to heal allocated and made them allocatable it solved the problem. – John Alexiou May 05 '21 at 16:32
  • @JohnAlexiou ok yeah, makes sense. Thanks! – CTZStef May 05 '21 at 17:57

2 Answers2

3

You are shooting the messenger. The Compaq generated code also calls _chkstk(), the difference is that it inlined it. A common optimization. The key difference between the two snippets is:

 mov eax, 0D3668h

vs

 sub esp, 233E4h

The values you see used here are the amount of stack space required by the function. The Intel code requires 0xd3668 bytes = 865869 bytes. The Compaq code requires 0x233e4 = 144356. Big difference. In both cases that's rather a large amount but the Intel one is getting critical, a program normally has a one megabyte stack. Gobbling up 0.86 megabytes of it is pushing it very close, nest a couple of functions calls and you're looking at this site's name.

What you need to find out, I can't help because it is not in your snippet, is why the Intel generated function needs so much space for its local variables. Workarounds are to use the free store to find space for large arrays. Or use the linker's /STACK option to ask for more stack space (guessing at the option name).

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Thanks for the info. Can you elaborate on _use the free store_ to find space? There are a lot of computations that happen in this array, filling an array of 18,000,000 elements of `REAL*8`. – John Alexiou Jun 09 '12 at 18:39
  • 1
    I'm pretty old but not quite old enough. Look at the ALLOCATE and DEALLOCATE statements in your favorite Fortran reference. – Hans Passant Jun 09 '12 at 18:46
  • These arrays are allocated in VBA and passed to the Fortran `.dll`. They are not allocated in Fortran, so I cannot use allocatable arrays. The posting has been very helpful though, as I am still looking for an answer here. – John Alexiou Nov 05 '13 at 17:15
0

The problem wasn't at the function call where the stack overflow occurred.

Earlier in the code, there were some global matrices initialized and they were placed in the stack and due to a bug in the code, they were still in scope and had already almost filled the stack. When the function call happened, the compiler tried to store the return address to the stack and it crashed the program.

The solution was to make the global matrices allocatable and also made sure the "Heap Arrays" option was set at an appropriate value.

Quite the rabbit hole this was, when it was 100% my buggy code the caused the issue.

John Alexiou
  • 28,472
  • 11
  • 77
  • 133