I'm working on an OS X app in a multi-GPU setup (Mac Pro late-2013) that uses OpenCL (on the secondary GPU) to generate a texture which is later drawn to the screen with OpenGL (on the primary GPU). The app is CPU-bound due to calls to glBindTexture() and glBegin(), both of which are spending basically all of their time in:
_platform_memmove$VARIANT$Ivybridge
which is a part of the video driver:
AMDRadeonX4000GLDriver
Setup: creates the OpenGL texture (glPixelBuffer) and then its OpenCL counterpart (clPixelBuffer).
cl_int clerror = 0;
GLuint glPixelBuffer = 0;
cl_mem clPixelBuffer = 0;
glGenTextures(1, &glPixelBuffer);
glBindTexture(GL_TEXTURE_2D, glPixelBuffer);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 2048, 2048, 0, GL_RGBA, GL_FLOAT, NULL);
glBindTexture(GL_TEXTURE_2D, 0);
clPixelBuffer = clCreateFromGLTexture(_clShareGroupContext, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, glPixelBuffer, &clerror);
Drawing code: maps the OpenGL texture onto the viewport. The entire NSOpenGLView is just this one texture.
glClear(GL_COLOR_BUFFER_BIT);
glBindTexture(GL_TEXTURE_2D, _glPixelBuffer); // <- spends cpu time here,
glBegin(GL_QUADS); // <- and here
glTexCoord2f(0., 0.); glVertex3f(-1.f, 1.f, 0.f);
glTexCoord2f(0., hr); glVertex3f(-1.f, -1.f, 0.f);
glTexCoord2f(wr, hr); glVertex3f( 1.f, -1.f, 0.f);
glTexCoord2f(wr, 0.); glVertex3f( 1.f, 1.f, 0.f);
glEnd();
glBindTexture(GL_TEXTURE_2D, 0);
glFlush();
After gaining control of the texture memory (via clEnqueueAcquireGLObjects()), the OpenCL kernel writes data to the texture and then releases control of it (via clEnqueueReleaseGLObjects()). The texture data should never exist in main memory (if I understand all of this correctly).
My question is: is it expected that so much CPU time is spent in memmove()? Is it indicative of a problem in my code? Or a bug in the driver, perhaps? My (unfounded) suspicion is that the texture data is moving via: GPUx -> CPU/RAM -> GPUy, which I'd like to avoid.