0

Using the following code, is it correct? I have 2GB Geforce 750M and using the PGI Fortran compiler. The program works fine for 4000x4000 arrays, anything higher it complains even though it should not, You can see i have allocated a 9000x9000 array but if i use a n value > 4000 it complains and throws a runtime error.

program matrix_multiply
!use openacc
   implicit none
   integer :: i,j,k,n
   real, dimension(9000,9000) :: a, b, c
   real x_scalar
   real x_vector(2)
   n=5000
   call random_number (b)
   call random_number (a)
   !$acc kernels 
   do k = 1,n
      do i = 1,n
         do j = 1,n
            c(i,k) = c(i,k) + a(i,j) * b(j,k)
         enddo
      enddo
   enddo
!$acc end kernels
end program matrix_multiply        
Kyle Kanos
  • 3,257
  • 2
  • 23
  • 37
Jovi DSilva
  • 216
  • 3
  • 14
  • 4
    Could you be more specific then "complains"? What is the error message? – M. S. B. Dec 13 '13 at 05:17
  • 1
    I guess a stack size problem, as the arrays still need to be on the host memory. Also, maybe beside the point, doing matrix multiplication can be done with CuBLAS as well. – steabert Dec 13 '13 at 08:00
  • I was able to compile and run the above code successfully on a Tesla M2050 (3GB), PGI 13.10 compiler, CUDA 5.0, RHEL 5.5. Even if I increase `n` to 9000, it runs correctly (takes about 60 sec.) Sorry I don't have a GeForce 750M to try it out. – Robert Crovella Dec 16 '13 at 02:35
  • the error i get is a custreamsynchronize() 702 timeout – Jovi DSilva Dec 16 '13 at 10:06
  • @RobertCrovella I guess the problem occurs because i have a display attached. I believe in case of the Tesla's there is no provision to connect a display – Jovi DSilva Dec 16 '13 at 10:08
  • @steabert I am working on optimizing a legacy fortran program by running it on the GPU. matrix multiply is part of it, but i face this problem which i pointed above when i stretch the limits. – Jovi DSilva Dec 16 '13 at 10:27
  • are you on windows or linux? – Robert Crovella Dec 16 '13 at 13:06
  • I am using Mac OS 10.9 – Jovi DSilva Dec 17 '13 at 05:42
  • 1
    I'm not that familiar with the Mac. My guess is that there is [some sort of display timeout on the mac](https://discussions.apple.com/thread/2620890) (also [here](http://stackoverflow.com/questions/11027151/disable-nvidia-watchdog-with-opencl-on-mac-os-x-10-7-4)) As you increase to a larger size, the matrix multiply kernel takes longer. At some point the display driver timeout in the Mac OS resets the GPU. If that is the case, you could work around it by switching to a system/GPU where the GPU is not hosting a display. Both Linux and Windows (TDR) also have such timeout mechanisms. – Robert Crovella Dec 19 '13 at 01:21
  • @RobertCrovella OK, So there is no way of setting this timeout? Because what you are saying is correct this timeout is very small and i believe it varies across OS and GPU. – Jovi DSilva Dec 19 '13 at 03:51
  • I'm familiar with some options on Windows and Linux, but I'm not familiar with any way to change it on Mac OS. I suspect the timeout is on the order of a few seconds. You may need to limit your kernel execution to less than that, or else find another environment if you must run long running kernels. – Robert Crovella Dec 19 '13 at 04:23

1 Answers1

0

Thanks to Robert Crovella

My guess is that there is some sort of display timeout on the mac (also here) As you increase to a larger size, the matrix multiply kernel takes longer. At some point the display driver timeout in the Mac OS resets the GPU. If that is the case, you could work around it by switching to a system/GPU where the GPU is not hosting a display. Both Linux and Windows (TDR) also have such timeout mechanisms.

You have to boot into >console mode in Mac OS and also disable automatic graphic switching as the console mode turns off Aqua (GUI in Mac) and thus is supposed to remove the limitation.

Jovi DSilva
  • 216
  • 3
  • 14