I was running this program on one GPU with 1GB global memory. It gave the following error:
Fatal error: cudaMemcpy1 error (unspecified launch failure at CheckDevice.cu:27)
*** FAILED - ABORTING
========= Out-of-range Shared or Local Address
========= at 0x000006a8 in grid::SetSubgridMarker(grid*, grid*)
========= by thread (0,0,0) in block (0,0,0)
========= Device Frame:SetAllFlags_dev(param_t*, grid*) (SetAllFlags_dev(param_t*, grid*) : 0x108)
========= Device Frame:SetAllFlags(param_t*, grid*) (SetAllFlags(param_t*, grid*) : 0x38)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x3dc) [0xc9edc]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 [0xa18a]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 (cudaLaunch + 0x17f) [0x2f4cf]
========= Host Frame:Transport [0xd395]
========= Host Frame:Transport [0xd7bd]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:Transport [0x17bd]
=========
========= Program hit error 4 on CUDA API call to cudaMemcpy
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x26a180]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 (cudaMemcpy + 0x271) [0x348e1]
========= Host Frame:Transport [0x2cea]
========= Host Frame:Transport [0x3769]
========= Host Frame:Transport [0xd7ee]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:Transport [0x17bd]
=========
========= Program hit error 4 on CUDA API call to cudaGetLastError
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x26a180]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.5.0 (cudaGetLastError + 0x1e6) [0x2a046]
========= Host Frame:Transport [0x2cef]
========= Host Frame:Transport [0x3769]
========= Host Frame:Transport [0xd7ee]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:Transport [0x17bd]
=========
========= ERROR SUMMARY: 3 errors
For the unspecified launch failure error, the relevant lines of code is a cudaMemcpy operation:
cudaMemcpy(CurrentGrid, Grid_dev, sizeof(grid), cudaMemcpyDeviceToHost);
cudaCheckErrors("cudaMemcpy1 error");
Then as shown in the error message, it said Out-of-range Shared or Local Address at 0x000006a8 in grid::SetSubgridMarker(grid*, grid*)
. Is it because of running out of global memory on device? Is there a way to return the memory usage on device?
In the source code, checkDevice.cu is executed after grid::SetSubgridMarker and checkDevice does not consume much memory space on device, so I'm guessing(but without much confidence) it's grid::SetSubgridMarker
that exhaust the memory so that there's no space to launch cudaMemcpy operation. Any suggestions? Thanks very much!