In the piece of code here I came across an struct for the shared memory definition and usages. I modified the allocation to be static and used it in a test program like below:
#include <stdio.h>
template<class T, uint bDim>
struct SharedMemory
{
__device__ inline operator T *() {
__shared__ T __smem[ bDim ];
return (T*) (void *) __smem;
}
__device__ inline operator const T *() const {
__shared__ T __smem[ bDim ];
return (T*) (void *) __smem;
}
};
template <uint bDim>
__global__ void myKernel() {
SharedMemory<uint, bDim> myShared;
myShared[ threadIdx.x ] = threadIdx.x;
__syncthreads();
printf("%d\tsees\t%d\tat two on the circular right.\n", threadIdx.x, myShared[ ( threadIdx.x + 2 ) & 31 ]);
}
int main() {
myKernel<32><<<1, 32>>>();
cudaDeviceSynchronize();
return 0;
}
It works fine as predicted. However, I have a few questions about this usage:
I don't understand the syntax used in the operator overloading section in the
sharedMemory
struct. Is it overloading the dereference operator*
? If yes, how accesses via square bracket translate into dereference pointer? Also, why does changing__device__ inline operator T *() {
line into__device__ inline T operator *() {
produce compiler errors?I wanted to ease the use of the wrapper by overloading the assignment operator or defining a member function, so that each thread updates the shared memory location corresponding to its thread index. So that, for example, writing down
myShared = 47;
ormyShared.set( 47 );
translates intomyShared[threadIdx.x] = 47;
behind the curtain. But I have been unsuccessful doing this. It compiles fine but the shared memory buffer is read all0
(which I think is the default shared memory initialization in the Debug mode). Can you please let me know where I'm doing things wrong? Here's my try:template<class T, uint bDim> struct SharedMemory { __device__ inline operator T*() { __shared__ T __smem[ bDim ]; return (T*) (void *) __smem; } __device__ inline operator const T *() const { __shared__ T __smem[ bDim ]; return (T*) (void *) __smem; } __device__ inline T& operator=( const T& __in ) { __shared__ T __smem[ bDim ]; __smem[ threadIdx.x ] = __in; return (T&) __smem[ threadIdx.x ]; } __device__ inline void set( const T __in ) { __shared__ T __smem[ bDim ]; __smem[ threadIdx.x ] = __in; } };
For the member function, the compiler gives out a warning:
variable "__smem" was set but never used
Although I am aware member variables cannot be __shared__
, I'm thinking I have a wrong assumption about or what I want to do is not matched with the __shared__
qualifier characteristics. I appreciate the help.