pycuda ,cuda -- some questions and a simple code that gives me error "identifier "N" is undefined "

Question

i am trying to learn pycuda and i have a few questions that i am trying to understand. I think my main question is how to communicate between pycuda and a function inside a cuda file.

So,if I have a C++ file (cuda file) and in there i have some functions and i want to implement pycuda in one of them.For example ,lets say i want the function 'compute' which contains some arrays and do calculations on them.What would be my approach?

1) Initialize the arrays in python,allocate memory to GPU and transfer data to GPU.

2) Call the mod=SourceModule(""" global void ......""") from pycuda.

Now, i want to ask :How i will handle this module?I will put all the 'compute' function in it?Because,if only do some calculations in 'global' ,i don't know how to communicate then between pycuda and c++ functions.How i will pass my results back to c++ file(cuda file).

3) In cuda we have the number of threads as 'blockDIm' and the number of blocks as 'gridDim'.In pycuda?We have block size ,block(4,4,1) which means 16 threads??And grid size, size(16,16) means 256 blocks?

4) I tried to do in pycuda an example from 'cuda by an example book' which adds vectors.The code is below:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import scipy as sc



N=50*1024

a=sc.arange(0,N).astype(sc.float32)
a_gpu = cuda.mem_alloc(a.nbytes) #allocate memory on GPU
cuda.memcpy_htod(a_gpu, a) #transfer data to the GPU

b=sc.array([i**2 for i in range(0,N)]).astype(sc.float32)
b_gpu = cuda.mem_alloc(b.nbytes)#allocate memory on GPU
cuda.memcpy_htod(b_gpu, b) #transfer data to the GPU

c=sc.zeros(N).astype(sc.float32)
c_gpu = cuda.mem_alloc(c.nbytes)#allocate memory on GPU


mod =SourceModule("""
   __global__ void add(int*a,int *b,int *c){
      int tid=threadIdx.x + blockIdx.x*gridDim.x;
        while (tid<N){
    c[tid]=a[tid]+b[tid];
    tid+=blockDim.x*gridDim.x;
         }
           }
            """)

#call the function(kernel)
func = mod.get_function("add")
func(a_gpu,b_gpu,c_gpu, block=(16,16,1),grid=(16,16))

#transfer data back to CPU
cuda.memcpy_dtoh(c, c_gpu)

but it gives me an error: "identifier "N" is undefined "

Thanks!

you have asked four almost completely different questions here. Could you try and rewrite the question to narrow the scope a bit? — talonmies, Dec 01 '11 at 11:50
May be you forgot pass the `N` (used in kernel function) into kernel (as parameter for example). — Yappie, Dec 01 '11 at 12:01
@yappie:I tried to pass N also int the 'global' function(and in python file) but still the same. — George, Dec 01 '11 at 12:06
@talonmies:It's the approach that i am asking and how to communicate bewtween pycuda and c++ functions.How to pass results from one to other.It is all together,i can't split it.. — George, Dec 01 '11 at 12:07
@George: if it is the C++/pyCUDA interoperability approach you are asking about, then why include the python code and syntax error question? They have nothing to do with one another. — talonmies, Dec 01 '11 at 12:22

jkysam · Accepted Answer · 2011-12-03T16:20:06.920

The way I use pycuda and the way I think it is intended to be used is as a bridge interface between python and cuda. It's not a python->c++ interface tool. For that you will have to look at something like SWIG. I wouldn't use pycuda inside c++ code to interface with a GPU, instead I would prototype or design my application using pycuda and later move it to using c++ only.

With that in mind I'll try to tackle you questions

1)With Pycuda you could also use the gpuarray module which will the allocation and transfer steps for you, so then you can just instantiate them and use them in the GPU:

import pycuda.gpuarray as gpuarray
a = gpuarray.arange(400, dtype=numpy.float32)
b = gpuarray.arange(400, dtype=numpy.float32)
#call Cuda function pass 'a' and 'b' 
resulta = a.get()
resultb = b.get()

2)Again, pycuda is not a c++ interface.If you need the results to go from cuda->python->c++, I don't think you need python in the middle.

3)Yes block(4,4,1) is 16 threads and grid(16,16) is 256 blocks.

Edit:

To answer some of your comments:

Yes a block(4,1,1) is one dimensional and block (4,4,1) is 2D.

I fixed your code, you just had to pass N to the CUDA kernel.

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import scipy as sc



N=50*1024

a=sc.arange(0,N).astype(sc.float32)
a_gpu = cuda.mem_alloc(a.nbytes) #allocate memory on GPU
cuda.memcpy_htod(a_gpu, a) #transfer data to the GPU

b=sc.array([i**2 for i in range(0,N)]).astype(sc.float32)
b_gpu = cuda.mem_alloc(b.nbytes)#allocate memory on GPU
cuda.memcpy_htod(b_gpu, b) #transfer data to the GPU

c=sc.zeros(N).astype(sc.float32)
c_gpu = cuda.mem_alloc(c.nbytes)#allocate memory on GPU


mod = SourceModule("""
   __global__ void add(int*a,int *b,int *c, int N){
      int tid=threadIdx.x + blockIdx.x*gridDim.x;
        while (tid<N){
    c[tid]=a[tid]+b[tid];
    tid+=blockDim.x*gridDim.x;
         }
           }
            """)

#call the function(kernel)
func = mod.get_function("add")
func(a_gpu,b_gpu,c_gpu, sc.int32(N), block=(16,16,1),grid=(16,16))

#transfer data back to CPU
cuda.memcpy_dtoh(c, c_gpu)
print c

Another way of doing this is to use string substitution on the SourceModule:

mod = SourceModule("""
   __global__ void add(int*a,int *b,int *c){
      const int N = %d;
      int tid=threadIdx.x + blockIdx.x*gridDim.x;
        while (tid<N){
    c[tid]=a[tid]+b[tid];
    tid+=blockDim.x*gridDim.x;
         }
           }
            """ % (N))

One last note is that, when you are using Pycuda, it generally works as the glue that connects all the different pieces of working with CUDA together. It helps you compile allocate memory, run your kernel etc... As long as you are using it like this you will be fine.

:Hello, maybe i did a mistake.When i say c++ file , i mean to implement cuda in a c++ file (so,have a cuda file).So ,my steps are :1) take a c++ file and convert it to cuda. 2) Make a pycuda file — George, Dec 01 '11 at 16:26
:Then, what i will have in the pycuda file?You said above 'I would prototype or design my application using pycuda and later move it to using c++ only.'What do you mean? — George, Dec 01 '11 at 16:27
:If you could give me an example from code which uses a cuda file and a pycuda file ,in order to understand better,i'll appreciate it! — George, Dec 01 '11 at 16:36
Sorry I'm not sure that I follow what you are looking for. In your example you could push the string under SourceModule to a file, but then you would ready that from python. — jkysam, Dec 01 '11 at 18:52
:Ok,i think that's it.Could you tell me though,why the code above gives me that error?Thanks! — George, Dec 03 '11 at 11:14
:Also, if i say 'block(4,1,1)'it means 4 blocks 1Dimension and if i say 'block(4,4,1)'it means 16 blocks 2Dimension? — George, Dec 03 '11 at 11:35
:Thanks a lot!So,every variable i define in python file , i have to pass it to the kernel like you did?What else have i to pass to it?Also,i had a problem and i can't test it!Please check http://stackoverflow.com/questions/8368815/pycuda-cuda-root-not-set-and-nvcc-not-in-path — George, Dec 03 '11 at 16:26
yeah you either pass it through the function or through variable substitution like in the second one. — jkysam, Dec 03 '11 at 16:59
:If i have a complex number (instead of N which was an integer and you passed it like sc.int32(N)) how can i pass it?Just complex(N) or sc.float32(N)? — George, Dec 04 '11 at 16:38
you can pass float32 which is a c++ float or float64 which is a c++ double — jkysam, Dec 04 '11 at 23:19

pycuda ,cuda -- some questions and a simple code that gives me error "identifier "N" is undefined "

1 Answers1