I'm pretty new to trying to use JCUDA (http://www.jcuda.de) but need to work out how to call a CUBIN file from my Java code so I can use CUDA to perform calculations on large arrays of data in a Java program, as part of an assignment.
I've tried to follow the "JCudaDriverCubinSample" file but am really struggling to see how I can modify that example code to work for my own CUBIN file (note: the example does run properly however).
For example, I'm trying a simple kernel generated from the .CU file:
__global__ void multiply_array( int *a, int *b, int *c, int N ) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < N)
c[tid] = a[tid] * b[tid];
}
This appears to build into a CUBIN file properly, but I can't seem to understand what changes I need to make to the example code to try and make this run from JCuda using the driver bindings.
Is anyone able to point me towards the solution, or towards material that would explain the code required a little more clearly? I'm finding the documentation on the JCuda website pretty sparse (but am really not adverse to reading, if there is a good resource to try and learn from).
Thanks!