i am trying to learn pycuda and i have a few questions that i am trying to understand. I think my main question is how to communicate between pycuda and a function inside a cuda file.
So,if I have a C++ file (cuda file) and in there i have some functions and i want to implement pycuda in one of them.For example ,lets say i want the function 'compute' which contains some arrays and do calculations on them.What would be my approach?
1) Initialize the arrays in python,allocate memory to GPU and transfer data to GPU.
2) Call the mod=SourceModule(""" global void ......""") from pycuda.
Now, i want to ask :How i will handle this module?I will put all the 'compute' function in it?Because,if only do some calculations in 'global' ,i don't know how to communicate then between pycuda and c++ functions.How i will pass my results back to c++ file(cuda file).
3) In cuda we have the number of threads as 'blockDIm' and the number of blocks as 'gridDim'.In pycuda?We have block size ,block(4,4,1) which means 16 threads??And grid size, size(16,16) means 256 blocks?
4) I tried to do in pycuda an example from 'cuda by an example book' which adds vectors.The code is below:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import scipy as sc
N=50*1024
a=sc.arange(0,N).astype(sc.float32)
a_gpu = cuda.mem_alloc(a.nbytes) #allocate memory on GPU
cuda.memcpy_htod(a_gpu, a) #transfer data to the GPU
b=sc.array([i**2 for i in range(0,N)]).astype(sc.float32)
b_gpu = cuda.mem_alloc(b.nbytes)#allocate memory on GPU
cuda.memcpy_htod(b_gpu, b) #transfer data to the GPU
c=sc.zeros(N).astype(sc.float32)
c_gpu = cuda.mem_alloc(c.nbytes)#allocate memory on GPU
mod =SourceModule("""
__global__ void add(int*a,int *b,int *c){
int tid=threadIdx.x + blockIdx.x*gridDim.x;
while (tid<N){
c[tid]=a[tid]+b[tid];
tid+=blockDim.x*gridDim.x;
}
}
""")
#call the function(kernel)
func = mod.get_function("add")
func(a_gpu,b_gpu,c_gpu, block=(16,16,1),grid=(16,16))
#transfer data back to CPU
cuda.memcpy_dtoh(c, c_gpu)
but it gives me an error: "identifier "N" is undefined "
Thanks!