I am trying to implement a very simple code :
#include <stdio.h>
__global__ void print_kernel() {
printf("Hello from block %d, thread %d\n", blockIdx.x, threadIdx.x);
}
int main() {
print_kernel<<<10, 10>>>();
cudaDeviceSynchronize();
}
but I get error because system is a rather old system and printf is not supported in this "compute capability 1.1" environment. Is there a way I can print thread and block number or get a value from device function and observe in host function main ?