The relation between your question of why "the results are basically zero" and P2P access with UVA is not immediately clear to me.
is that true or am I doing something wrong?
It's hard to say as your question is a bit vague and no complete example is shown.
__constant__ float M_
allocates a variable M_
on the constant memory of all CUDA visible devices. In order to set the value on multiple devices you should do something like:
__constant__ float M_; // <= This declares M_ on the constant memory of all CUDA visible devices
__global__ void showMKernel() {
printf("****** M_ = %f\n", M_);
}
int main()
{
float M = 2.0;
// Make sure that the return values are properly checked for cudaSuccess ...
int deviceCount = -1;
cudaGetDeviceCount(&deviceCount);
// Set M_ on the constant memory of each device:
for (int i = 0; i < deviceCount; i++) {
cudaSetDevice(i);
cudaMemcpyToSymbol(M_, &M, sizeof(float), 0, cudaMemcpyDefault);
}
// Now, run a kernel to show M_:
for (int i = 0; i < deviceCount; i++)
{
cudaSetDevice(i);
printf("Device %g :\n", i);
showMKernel<<<1,1>>>();
cudaDeviceSynchronize();
}
}
which returns:
Device 0 :
****** M = 2.000000
Device 1 :
****** M = 2.000000
// so on for other devices
Now, if I replace
// Set M_ on the constant memory of each device:
for (int i = 0; i < deviceCount; i++) {
cudaSetDevice(i);
cudaMemcpyToSymbol(M_, &M, sizeof(float), 0, cudaMemcpyDefault);
}
with
cudaMemcpyToSymbol(M_, &M, sizeof(float), 0, cudaMemcpyDefault);
this will only set the value of M_
on the active device and therefore returns
Device 0 :
****** M = 2.000000
Device 1 :
****** M = 0.000000 // <= I assume this is what you meant by 'the results are basically zero'
// M = 0 for other devices too
are there any other kind of memory that also cannot be accessed this
way (texture memory for example)?
Again I'm not entirely sure what this way is. I think generally you cannot access the constant memory or the texture memory of one device from any other devices, though I am not 100% certain.
UVA assigns one address space for CPU and GPU memories such that memory copying between host and the global memory of multiple devices become easily accessible through the use of cudaMemcpy
with kind cudaMemcpyDefault
.
Also, P2P communication between devices allows for direct accesses and transfers of data between the global memory of multiple devices.
Similar to the __constant__
example above, when you declare a texture like texture <float> some_texture
, some_texture
will be defined for each visible device, however you would need to explicitly bind some_texture
to your texture reference on each device when working with multiple devices.