1

Hi I am profiling with VTUNE (an intel visual studio extension) my 2D numerical model I wrote for my research, in order to speed it up a little. I already sped up my 1D model this way (i.e. identifying the "hotspot" of my model). This time though, after running the profiler I see that the most time consuming part is not a fortran subroutine I wrote (as it occured for my 1D model) but it is a dll called Acxtrnal.dll. I googled the name of this dll but I could not find better information. Does anybody know why this dll is taking so much and what it is needed for? thanks A.

EDIT: So I was able to add download the symbols for the DLL from Microsoft website so now when debugging it shows that the CPU time is lost here. NS_FaultTolerantHeap::APIHook_RtlFreeHeap. If I expand it shows (uppercase subroutines are mine):

free<-for__free_vm
for_write_int_fmt_xmit<-for_write_int_fmt<-LIMITERSUBR<-RECMUSCL<-MAIN__<-main<-_tmainCRTStartup<-BaseThreadInitThunk<-RtlUserThreadStart<-RtlUserThreadStart
for
_release_lun<-for_write_int_fmt_xmit<-for_write_int_fmt<-LIMITERSUBR<-RECMUSCL<-MAIN
<-main<-tmainCRTStartup<-BaseThreadInitThunk<-_RtlUserThreadStart<-_RtlUserThreadStart

Millemila
  • 1,612
  • 4
  • 24
  • 45
  • It appears it is a Windows thing for initializing USB plug-in devices: http://dll.paretologic.com/detail.php/acxtrnal. – Kyle Kanos Jan 29 '13 at 01:05
  • If you are running under a debugger, hit ^C to break into it. If you find that dll on the stack, the call stack will tell you why you're in it. – Mike Dunlavey Jan 29 '13 at 12:29
  • I edited my post above: Seems related to the fault tolerant heap... I am trying to figure this out what I have found it is this but I do not think it is a good thing to play with the registers: http://stackoverflow.com/questions/5020418/how-do-i-turn-off-the-fault-tolerant-heap – Millemila Feb 01 '13 at 23:24

1 Answers1

1

Good, you took a couple stack samples, shown here. Your RECMUSCL is calling LIMITERSUBR, which is calling for_write_int_fmt, which is doing a lot of stuff.

free
for__free_vm
for_write_int_fmt_xmit
for_write_int_fmt
LIMITERSUBR   <------ Look at the line in LIMITERSUBR that prints integers
RECMUSCL              because it appears on both stack samples
MAIN__
main
_tmainCRTStartup
BaseThreadInitThunk
__RtlUserThreadStart
_RtlUserThreadStart  

for__release_lun
for_write_int_fmt_xmit
for_write_int_fmt
LIMITERSUBR
RECMUSCL
MAIN__
main
_tmainCRTStartup
BaseThreadInitThunk
__RtlUserThreadStart
_RtlUserThreadStart

You could look on the stack sample at the line of code in LIMITERSUBR where you are writing integers, and see if you need to be doing that.

(You see, you didn't really need the symbols in the system dll :)

It's good that you took two stack samples, so you could see the problem twice. Seeing a problem once is not enough unless you know in advance that you have a really serious slowdown. Seeing it twice in so few samples means it is responsible for a large fraction of the time, like more than 50 percent and possibly close to 100, so it's worthwhile trying to fix. (Actually it's a Beta distribution whose most likely value is 2/2 = 100%.)

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • 1
    thanks I solved. this line was taking all that time: write(buttaPR(1:6),'(i6)') jtime in which: character*6 buttaPR integer jtime I did not now that internal file writing (inside a cycle) took so much time. Anyways I do not really need it so I simply commented it. thanks problem solved – Millemila Feb 04 '13 at 05:18
  • @Alberto: Yes. Thanks for providing a perfect example of real-world performance tuning. – Mike Dunlavey Feb 04 '13 at 13:56
  • @Alberto: I edited the answer because I saw that you had two samples, and gave a little more explanation for other readers. – Mike Dunlavey Feb 04 '13 at 19:55
  • Thanks. I think I needed the symbols because without the symbols Vtune did not give me the stacks, and those were the stacks of a single run (anyways rerunning with vtune was always giving the same result). – Millemila Feb 05 '13 at 00:53