1

I would like to use the Texture Memory for Interpolation of Data. I have 2 Arrays (namely A[i] and B[i]) and I would want to interpolate Data between them. I thought I could bind them to Texture Memory and set the interpolation but I am not sure how I can do that.

The examples that come with CUDA use the A[i-1] and A[i+1] for the interpolation.

Is there any way to do what I planned? I'm trying this because I think I can get a good speedup.

Javier Cabero
  • 118
  • 1
  • 12
Silve2611
  • 2,198
  • 2
  • 34
  • 55
  • There is an extensive discussion at [Writing CUDA kernels for interpolation](https://vitalitylearning2021.github.io/interpolationCUDA/) on how implementing interpolation using CUDA texture memory. – Vitality Aug 24 '21 at 06:22

2 Answers2

2

If you're not used to developing with CUDA, using texture memory is not the easiest thing to start with.

I'd suggest you to try writing a first parallel version of your algorithm in CUDA with no optimisation. Then, use the NVIDIA Visual Profiler on your application to figure out whether you need to set up texture memory to optimize your memory accesses.

Remember that the earlier you optimize, the trickier it is to debug.

Last but not least, the latest CUDA version (CUDA 5, still in release candidate) is able to automatically store your data in texture memory as long as you declare the input buffers passed as parameters to your kernel as const restrict pointers.

jopasserat
  • 5,721
  • 4
  • 31
  • 50
  • Thx for the answer. I already have a first parallel version. But the speedup compared to a CPU is not very impressive. That's why i would want to try this out. – Silve2611 Aug 22 '12 at 09:07
  • Run the profiler first. Your problem might be anything but memory accesses. – jopasserat Aug 22 '12 at 09:29
  • It's not that i have a problem. I would just want to compare the result, so it would be helpful if I know how to do that. – Silve2611 Aug 22 '12 at 11:16
2

Yes, you can do this with texture memory, and it is fast. I personally use ArrayFire to accomplish these kinds of operations, because it is faster than I can hope to code by hand.

If you want to code by hand yourself in CUDA, something like this is what you want:

// outside kernel

texture<float,1>  A;
cudaChannelFormatDesc desc = cudaCreateChannelDesc<float>();
cudaArray *arr = NULL;
cudaError_t e = cudaMallocArray(&arr, &desc, 1, length);
A.filterMode = cudaFilterModePoint;
A.addressMode[0] = cudaAddressModeClamp;
cudaBindTextureToArray(A, arr, desc);

...

// inside kernel
    
valA = tex1D(A,1,idx)
valB = tex1D(B,1,idx)

float f = 0.5;
output  = (f)*valA + (1-f)*valB;

If you want to just plug-in ArrayFire (which in my experience is faster than what I try to code by hand, not to mention way simpler to use), then you'll want:

// in arrayfire
array A = randu(10,1);
array B = randu(10,1);
float f = 0.5;
array C = (f)*A + (1-f)*B;

The above assumes you want to interpolate between corresponding indices of 2 different arrays or matrices. There are other interpolation functions available too.

Ben Stewart
  • 311
  • 1
  • 5