First, I am having a hard time figuring out how clCreateBuffer() works when passed CL_MEM_ALLOC_HOST_PTR. Does it create a buffer on the device AND allocate memory for the host, or does it only allocate memory on the host and cache it on the device when it's being used?
My problem is this: If I have quite a few objects that have float* fields that total more space than is available on my device, is there a better way then telling the runtime to copy the host pointer (or use it) to the OpenCL device? Is it possible to have the runtime create the host pointer and use that for all the float* even if they total more memory than the device has? I wouldn't mind telling it to use the host pointer, but if I wanted to avoid memory copies when the runtime is on the CPU I would have to align all the memory.
Also, any tips on good ways to deal with using more memory on the host than is available on the device to make memory transfers the most efficient and do the least copying.
Thanks.