I am writing a path tracer for GPU using CUDA 10.2. The entire program ran fine until i added a recursive call to the trace function. nvcc still compiles it, although with the warning: "Severity Code Description Project File Line Suppression State Warning Stack size for entry function '' cannot be statically determined". When the GPU reaches the point it stops and the next time CPU gets an cudaError from an API call it is cuda error 715, which is cudaErrorIllegalInstruction. I tried recreating the issue by writing another recursive kernel/function pair, and the compiler gave the same warning, but it executed expectedly. Unfortunately this means i have to dump my entire function here (if there are any questions to the functions and types used i will happily answer them):
__device__ Vec3 trace(
const Settings& settings,
const Ray& r,
const Shape* shapes,
const size_t nshapes,
uint8_t bounces,
curandState& randState) {
if (bounces >= settings.maxBounces) {
return Vec3(0.0f);
}
const Shape* shape = nullptr;
float t = inf;
bool flipNormal;
float dist;
for (size_t i = 0; i < nshapes; i++) {
if (shapes[i].intersect(r, dist, flipNormal) && dist < t) {
shape = shapes + i;
t = dist;
}
}
if (shape == nullptr)
return settings.background;
const Vec3 hitPos = r.ori + t * r.dir;
const Vec3 normal = flipNormal ? -shape->normal(hitPos) : shape->normal(hitPos);
const Vec3 hemiDir = cosineSample(normal, randState);
const Vec3 traceCol = trace(
settings,
Ray(hitPos + normal * settings.bias, hemiDir),
shapes,
nshapes,
bounces + 1,
randState
);
return shape->surface.emittance + shape->surface.color * traceCol;
}
Has anyone else had this issue and in that case, how was it fixed? I could probably redesign to a non-recursive design, although it wouldn't be an optimal solution. I don't even know where to start with debugging this issue, so any ideas are greatly appreciated.