3

I would like to know the best practice concerning the following type of warning:

ptxas warning : Stack size for entry function '_Z11cuda_kernelv' cannot be statically determined

It appears adding the virtual keyword to the destructor of Internal, i.e. moving from __device__ ~Internal(); to __device__ virtual ~Internal(); in the following programme:

template<typename T>
class Internal {
  T val;
public:
  __device__ Internal();
  __device__ virtual ~Internal();
  __device__ const T& get() const;
};

template<typename T>
__device__ Internal<T>::Internal(): val() {}
template<typename T>
__device__ Internal<T>::~Internal() {}
template<typename T>
__device__ const T& Internal<T>::get() const { return val; }


template<typename T>
class Wrapper {
  Internal<T> *arr;
public:
  __device__ Wrapper(size_t);
  __device__ virtual ~Wrapper();
};

template<typename T>
__device__ Wrapper<T>::Wrapper(size_t len): arr(nullptr) {
  printf("%s\n", __PRETTY_FUNCTION__);
  arr = new Internal<T>[len];
}

template<typename T>
__device__ Wrapper<T>::~Wrapper() {
  delete[] arr;
}

__global__ void cuda_kernel() {
  Wrapper<double> *wp = new Wrapper<double>(10);
  delete wp; 
}

int main() {
  cuda_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
  return 0;
}

Having faced with the warning shown above, I wonder what I should do in this case?

unegare
  • 2,197
  • 1
  • 11
  • 25
  • 1
    Unrelated tactical note: This may have been eliminated during minimization, but [`Wrapper` does not observe the Rule of Three (or friends)](https://en.cppreference.com/w/cpp/language/rule_of_three). – user4581301 Mar 28 '22 at 20:30
  • @user4581301, Thank you for the note, although sticking to the rule even in this short reproducible example does not change nothing with the warning [tested], but I guess not sticking to it in this example makes it more concise and "more to the point". Still and all, thanks for your advice. – unegare Mar 28 '22 at 20:53
  • [Cannot reproduce this online](https://godbolt.org/z/KjrjPfzKc). What compiler version and what compiler flags are you using? – Artyer Mar 28 '22 at 21:30
  • 2
    In your godbolt example, go to the output menu and select "compile to binary". Then you will see the report in godbolt. – Robert Crovella Mar 28 '22 at 22:12
  • 2
    The warning is effectively an assembler optimisation pass warning. All you can do is ignore it, understanding that there are potentially performance opportunities being left on the table – talonmies Mar 28 '22 at 22:39

1 Answers1

2

The very short answer is that there is nothing you can do about this particular warning.

In more detail:

  1. This warning is an assembler warning, not a compiler warning
  2. The NVIDIA toolchain relies on a lot of assembler level optimizations to produce performant SASS machine code that runs on the silicon. The NVIDIA compiler emits PTX virtual machine language which can undergo significant transformation when assembled. This includes resolution of single static assignment form compiler output into static register assignment (and register spilling to local memory), inline expansion of functions, and emission of a statically compiled stack reservation. All of these are potentially performance optimizing operations.
  3. This is an informational warning from the assembler, which is telling you that during a static code analysis, the assembler was unable to determine the stack size.
  4. The most normal scenario when the assembler emits this warning is when recursion is detected within kernel code. Your use case is clearly another.
  5. The warning comes from an assembler optimisation pass. The assembler is letting you know that there are potentially performance improvement opportunities being left on the table because the structure of the compiler output of your code can't allow it to statically determine the stack size
  6. The fallback the assembler will use will be more boilerplate SASS to set-up and tear-down the per thread stack which the kernel will require to run. The warning is letting you know that happened.
talonmies
  • 70,661
  • 34
  • 192
  • 269