I am offloading work to GPU using OpenCL (a variant of matrix multiplication). The matrix code itself works fantastically well, but the cost of moving data to GPU is prohibitive.
I've moved from using clEnqueueRead/clEnqueueWrite to memory mapped buffers as follows:
d_a = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_ALLOC_HOST_PTR,
sizeof(char) * queryVector_size,
NULL, NULL);
checkErr(err,"Buf A");
d_b = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_ALLOC_HOST_PTR,
sizeof(char) * segment_size,
NULL, NULL);
checkErr(err,"Buf B");
err = clSetKernelArg(ko_smat, 0, sizeof(cl_mem), &d_c);
checkErr(err,"Compute Kernel");
err = clSetKernelArg(ko_smat, 1, sizeof(cl_mem), &d_a);
checkErr(err,"Compute Kernel");
err = clSetKernelArg(ko_smat, 2, sizeof(cl_mem), &d_b);
checkErr(err,"Compute Kernel");
query_vector = (char*) clEnqueueMapBuffer(commands, d_a, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * queryVector_size, 0, NULL, NULL, &err);
checkErr(err,"Write A");
segment_data = (char*) clEnqueueMapBuffer(commands, d_b, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * segment_size, 0, NULL, NULL, &err);
checkErr(err,"Write B");
// code which initialises buffers using ptrs (segment_data and queryV)
err = clEnqueueUnmapMemObject(commands,
d_a,
query_vector, 0, NULL, NULL);
checkErr(err,"Unmap Buffer");
err = clEnqueueUnmapMemObject(commands,
d_b,
segment_data, 0, NULL, NULL);
checkErr(err,"Unmap Buff");
err = clEnqueueNDRangeKernel(commands, ko_smat, 2, NULL, globalWorkItems, localWorkItems, 0, NULL, NULL);
err = clFinish(commands);
checkErr(err, "Execute Kernel");
result = (char*) clEnqueueMapBuffer(commands, d_c, CL_TRUE,CL_MAP_WRITE, 0, sizeof(char) * result_size, 0, NULL, NULL, &err);
checkErr(err,"Write C");
printMatrix(result, result_row, result_col);
This code works fine when I use the ReadEnqueue/WriteEnqueue methods and intialise d_a, d_b, d_c through that, but when I use the MappedBuffers, result is 0 due to d_a and d_b being null when running the kernel.
What is the appropriate way to map/unmap buffers?
EDIT: the core problem seems to be from here
segment_data = (char*) clEnqueueMapBuffer(commands, d_b, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * segment_width * segment_length, 0, NULL, NULL, &err);
// INITIALISE
printMatrix(segment_data, segment_length, segment_width);
// ALL GOOD
err = clEnqueueUnmapMemObject(commands,
d_b,
segment_data, 0, NULL, NULL);
checkErr(err,"Unmap Buff");
segment_data = (char*) clEnqueueMapBuffer(commands, d_b, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * segment_width * segment_length, 0\
, NULL, NULL, &err);
printMatrix(segment_data, segment_length, segment_width);
// ALL ZEROs again
The first printMatrix() returns the correct output, once I unmap it and remap it, segment_data becomes all 0s (it's initial value). I suspect I'm using an incorrect flag somewhere? I cant' figure out where though.