Could D's Mixins be used to map linear algebra operations to either/both CPU code and OpenCL or GPU vertex shader functions such as GLSL? This would be a real killer application for D and better bridge logic targeted for both CPU and GPU execution. Compare this with glm and D's gl3n which is only compile fixed-size linear algebra to CPU-code.
VexCL is a proof of concept for this using OpenCL and C++11 (GCC 4.6 or later) by completely abstracting away backend-dependent (CPU/GPU) implementation details about memory allocations and code execution somewhat similar to C++ AMP. So things can only get better in D right? Can mixins completely replace the use of C++ expression templates used in VexCL? Here's a nice tutorial on its use.
CTFE may also play a role here in this discussion.