CUDA: optimizing data transfer when input arguments are constant during program execution

Asked Dec 31 '20 at 02:37

Active Dec 31 '20 at 03:30

Viewed 22 times

For illustrative purposes, let

__device__ void distance(char *s1, char* s2)

be the device function, which is run over across several blocks and threads compute<<<1024,256>>>(s1, s2, s3).

We can assume char *s1 and char *s2 are generated prior to issuing CUDA instructions, and that they are constant throughout execution of all kernels. Is there a way to allocate s1 and s1 such that transferring them to all threads is optimized? Is using __const__ declaration an appropriate way to optimize data data transfer?

I'm using a device with compute capability 8.0+.

edited Dec 31 '20 at 03:30

talonmies

70,661
34
192
269

asked Dec 31 '20 at 02:37

Ameer Jewdaki

1,758
4
21
36

constant memory has a limit of 64kb, so unless your s1 and s2 are small, it is unlikely to be much use to you – talonmies Dec 31 '20 at 03:34
@talonmies I only meant sharing the pointers themselves, so 16 bytes only. – Ameer Jewdaki Dec 31 '20 at 03:39
2

All kernel arguments are automagically passed in constant memory so the optimal solution is to do exactly nothing – talonmies Dec 31 '20 at 03:51
2

See the linked duplicate for a more elaborate answer covering simple arguments like pointers as well as passing structs by value – talonmies Dec 31 '20 at 04:07
thanks, my experiments also showed that adding `__constant__` didn't improve the results, so I was wondering if there's anything else to do. – Ameer Jewdaki Dec 31 '20 at 04:21

CUDA: optimizing data transfer when input arguments are constant during program execution

0 Answers0