1

Suppose some kernel (a __global__ function named foo) is running on a CUDA device. And suppose that kernel calls a __device__ function bar which is sometimes called from other kernels, i.e. the code of bar does not know at compile-time whether the kernel is foo or something else.

Can a thread running foo, within bar, obtain either the name "foo", the signature, or some other identifier of the kernel, preferable a human-readable one?

If necessary, assume the code has been compiled with any of --debug, --device-debug and/or --lineinfo.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • What you're describing is very close to reflection, and that isn't supported by C++, let alone CUDA. I would be incredibly surprised if the answer wasn't no. – talonmies Dec 30 '19 at 17:34
  • @talonmies: I didn't ask if it's support by the _language_ per se. But on the host-side it's perfectly possible to get a [stack trace](https://www.boost.org/doc/libs/1_71_0/doc/html/stacktrace.html). As for reflection more generally - a bunch of it is supported "en passent" in various ways, e.g. getting the [string name of a type at compile-time](https://stackoverflow.com/questions/35941045/can-i-obtain-c-type-names-in-a-constexpr-way). – einpoklum Dec 30 '19 at 17:58
  • If it isn't supported by the language from which the code was compiled, it is extremely unlikely to be supported by some lower level facility, especially on an extremely thin and primitive runtime like a GPU – talonmies Dec 30 '19 at 18:02
  • @talonmies: The thing is, CUDA has some "surprising" features, like being able to `printf()` from the device side, or dynamic memory allocation. Plus, there is debugging support. So, who knows, maybe some kernel-name-bookkeeping does happen somewhere on the side. – einpoklum Dec 30 '19 at 18:20

1 Answers1

1

The kernel can read the special register %gridid. %gridid is unique per launch. If performance then a simple kernel prolog can have one thread from each kernel launch output the gridid global function map using func and %gridid. Alternatively, the CUPTI SDK Activity API can be used to collect this information. The CUpti_ActivityKernel2 event contains per launch meta-data including the gridId and CUfunction name.

Here is an example reading %gridid.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <stdint.h>

cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size);

static __device__ __inline__ uint64_t __gridid()
{
    uint64_t gridid;
    asm volatile("mov.u64 %0, %%gridid;" : "=l"(gridid));
    return gridid;
}

__device__ void devPrintName()
{
    static const char* name = __func__;
    printf("%llu %s\n", __gridid(), name);
}

__global__ void globPrintName()
{
    static const char* name = __func__;
    printf("%llu %s\n", __gridid(), name);
    devPrintName();
}

int main()
{
    for (int i = 0; i < 4; ++i)
    {
        globPrintName<<<1,1,0>>>();
        cudaDeviceReset();
    }
    return 0;
}

This sample outputs

1 globPrintName
1 devPrintName
2 globPrintName
2 devPrintName
3 globPrintName
3 devPrintName
4 globPrintName
4 devPrintName
Greg Smith
  • 11,007
  • 2
  • 36
  • 37
  • So, +1, but this requires the cooperation of all kernels calling the device function. – einpoklum Mar 20 '20 at 16:51
  • Also - getters for `%gridid` and all other special registers [are available](https://github.com/eyalroz/cuda-kat/blob/master/src/kat/on_device/ptx/special_registers.cuh) with my [CUDA kernel author's toolkit library (cuda-kat)](https://github.com/eyalroz/cuda-kat/tree/development). – einpoklum Mar 20 '20 at 16:53