I am trying to implement box filter in C-CUDA, starting with implementing matrix average problem in CUDA first. When I try following code without commenting those lines within for loops than I get the certain output. But when I comment those lines then it generates the same output again!
if(tx==0)
for(int i=1;i<=radius;i++)
{
//sharedTile[radius+ty][radius-i] = 6666.0;
}
if(tx==(Dx-1))
for(int i=0;i<radius;i++)
{
//sharedTile[radius+ty][radius+Dx+i] = 7777;
}
if(ty==0)
for(int i=1;i<=radius;i++)
{
//sharedTile[radius-i][radius+tx]= 8888;
}
if(ty==(Dy-1))
for(int i=0;i<radius;i++)
{
//sharedTile[radius+Dy+i][radius+tx] = 9999;
}
if((tx==0)&&(ty==0))
for(int i=globalRow,l=0;i<HostPaddedRow,l<radius;i++,l++)
{
for(int j=globalCol,m=0;j<HostPaddedCol,m<radius;j++,m++)
{
//sharedTile[l][m]=8866;
}
}
if((tx==(Dx-1))&&(ty==(Dx-1)))
for(int i=(HostPaddedRow+1),l=(radius+Dx);i<(HostPaddedRow+1+radius),l<(TILE+2*radius);i++,l++)
{
for(int j=HostPaddedCol,m=(radius+Dx);j<(HostPaddedCol+radius),m<(TILE+2*radius);j++,m++)
{
//sharedTile[l][m]=7799.0;
}
}
if((tx==(Dx-1))&&(ty==0))
for(int i=(globalRow),l=0;i<HostPaddedRow,l<radius;i++,l++)
{
for(int j=(HostPaddedCol+1),m=(radius+Dx);j<(HostPaddedCol+1+radius),m<(TILE+2*radius);j++,m++)
{
//sharedTile[l][m]=9966;
}
}
if((tx==0)&&(ty==(Dy-1)))
for(int i=(HostPaddedRow+1),l=(radius+Dy);i<(HostPaddedRow+1+radius),l<(TILE+2*radius);i++,l++)
{
for(int j=globalCol,m=0;j<HostPaddedCol,m<radius;j++,m++)
{
//sharedTile[l][m]=0.0;
}
}
__syncthreads();
You can ignore those for loop conditions and all, they are irrelevant here right now. May basic problem and question is why am I getting the same vales even after commenting those lines? I tried making some modification in my main program and kernel as well. Also entered manual errors and removed them, and again compiled and executed the same code, but still getting same values. Is there any way to clear cache memory in CUDA? I am using Nsight + RedHat + CUDA 5.5. Thanks in advance.