0

I am running a monte carlo simulation using Thrust on an Nvidia card with 2.1 compute capability. If I try to transform_reduce the whole device_vector at once, I get the following error. Its not a matter of using up the memory on device because the vectors are that big (~1-10mb). I know my code is right because it works if I compile with openmp and run on the host only. What can be causing this problem?

Unhandled exception at 0x776e15de in mccva.exe: Microsoft C++ exception: thrust::system::system_error at memory location 0x0014cb28.

But if I do the transform_reduce in chunks it works fine until I scale the number of timesteps in the simulation which it then gives the same error.

//run the Monte Carlo simulation
zpath * norm_ptr = thrust::raw_pointer_cast(&z[0]);
cout << "initialized raw pointer" << endl;
thrust::device_vector<ctrparty> devctrp = ctrp;
assert(devctrp.size()==ctrp.size());
cout << "Initialized device vector" << endl;
cout << "copied host vec to device vec" << endl;

float cva = 0;
for(unsigned int i=0; i<5; i++)
{
    if(i<4)
        cva += (1-R) * thrust::transform_reduce(devctrp.begin()+i*2000, devctrp.begin() + (i+1)*2000 - 1, calc(norm_ptr, dt, r, sims, N), 0.0f, sum());
    else
        cva += (1-R) * thrust::transform_reduce(devctrp.begin()+i*2000, devctrp.begin() + (i+1)*2000, calc(norm_ptr, dt, r, sims, N), 0.0f, sum());
}  

I get the error when I try this:

float cva = 0.0f;
try
{
    cva = thrust::transform_reduce(devctrp.begin(), devctrp.end(), calc(norm_ptr, dt, r, sims, N), 0.0f, sum()); //get the simulated CVA
}
catch(thrust::system_error e)
{
    printf(e.what());
}

I'm using VS2010 and when it breaks at the errors it points to the following in the dbgheap.c file.

__finally {
    /* unlock the heap
     */
    _munlock(_HEAP_LOCK);
}
postelrich
  • 3,274
  • 5
  • 38
  • 65
  • what are the definitions of `calc()` and `sum()` ? One of those may be the issue. You could try doing just a `thrust::transform` with `calc` and just a `thrust::reduce` with `sum()` to see if you can narrow down the source of the error. For instance, `norm_ptr` points to the device array `z`. I don't know how `calc` uses it exactly, but if it is indexing through `z` in some fashion, then perhaps when you increase the length of the transform, you're running into trouble there. It's just speculation, but it would help to see a more complete description of what you are doing in the transform – Robert Crovella Mar 08 '13 at 08:01
  • 2
    Are you building the debug or release version of the project? – Robert Crovella Mar 08 '13 at 08:03
  • I have checked that the calc and sum functions are working correctly by using printf within the functions. What I see when using printf when transform_reducing the entire vector at once, is that it looks like it breaks it up into chunks itself as I see "calc" followed by "sum" followed by more "calc" and "sum" but it craps out somewhere along the way. I'm using a debug version, I do have the -g and -G flags turned off. – postelrich Mar 08 '13 at 16:09
  • Could be that your computation requires too much time and is being terminated by the "watchdog timer". – Jared Hoberock Mar 08 '13 at 20:10
  • @JaredHoberock Is there a way I can check that? – postelrich Mar 09 '13 at 06:43
  • Not sure, but you could refer to [this question](http://stackoverflow.com/questions/497685/how-do-you-get-around-the-maximum-cuda-run-time) for some ideas. – Jared Hoberock Mar 10 '13 at 00:22
  • It seems like on Windows you have to mess with the registry which can be dangerous. It says the watchdog timer is only for a graphics card attached to a monitor. My laptop has both the intel hd4000 and nvidia card. Is there some way to run the program on the nvidia while the integrated graphics takes care of the display? – postelrich Mar 11 '13 at 04:23

1 Answers1

2

I had that kind of error with thrust when I forgot to adjust the Properties of the project to my CUDA card compute capability

Configuration Properties > CUDA C\C++ > Device > Code Generation change compute_10,sm_10 to your GPU compute capability

For Nvidia card with 2.1 compute capability it will be compute_20,sm_21

Rodion
  • 886
  • 10
  • 24
  • I found it to be an error with the Windows watchdog timer timing out. I,m already using sm_20. Does 2.1 happen to deal with that? – postelrich May 28 '13 at 15:05
  • What is the cause for Windows watchdog timer to time out exactly? – Rodion May 29 '13 at 16:24
  • I'm not entirely sure just that Windows will cause the nvidia driver to crash if it doesn't get a response from the device in time. Or something along those lines – postelrich May 30 '13 at 00:56