Cuda same kernel, but different results with constant

Asked Mar 01 '22 at 15:28

Active Mar 03 '22 at 23:27

Viewed 145 times

How can cudaMemcpyToSymbol just make this ??

// head.h
#include <stdio.h>
__constant__ float const_mem[1];

__global__ void k0();   //I will declare it in main.cu
__global__ void k1();   //I will declare it in separate.cu

//separate.cu
#include "head.h"

__global__ void k0() {
     printf("%f\n", const_mem[0]);
}

//main.cu
#include "head.h"

__global__ void k1() {
     printf("%f\n", const_mem[0]);
}

int main() {
     float arr[] = {5};
     cudaMemcpyToSymbol(const_mem, arr, sizeof(float));
     k0<<<1,1>>>();
     k1<<<1,1>>>();
}

Compilation : nvcc main.cu separate.cu

output of sudo nvprof ./a.out (./a.out gives litteraly nothing)

0.000000
5.000000

That mean that kernel writed in an other transition unit is not accessing const_memory ... but how is it possible ??

edited Mar 03 '22 at 23:27

talonmies

70,661
34
192
269

asked Mar 01 '22 at 15:28

Vadim Kashtanov

the reason `./a.out` gives literally nothing is that you should have a `cudaDeviceSynchronize()` after your kernel calls. – Robert Crovella Mar 01 '22 at 15:32
ok thanks, but why programme do not wait until kernel end ? – Vadim Kashtanov Mar 01 '22 at 15:33
2

because kernel launches are asynchronous, and you don't have any code in your program that tells it to wait until kernel end. – Robert Crovella Mar 01 '22 at 15:34

Cuda same kernel, but different results with __constant__

0 Answers0

Cuda same kernel, but different results with constant