2

I wrote this as a question in a comment but I feel it is worth its own question.

I would like to build a conjugate gradients solver on the iPhone/iPad as it would open up a realm of new possibilities for GPGPU programming on these devices such as real time optical flow/real time simulations/real time finite elements.

The very well written chapter from GpuGems explains how it can be done using floating point textures.

The first problem I have encountered is that I havent managed to create a floating point render-to-texture. Perhaps I simply dont have the right parameters for my texture setup. For example this code succeeds on the first line but then fails on the second one with an GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT error.

glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, 256, 256, 0, GL_LUMINANCE, GL_HALF_FLOAT_OES, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, textureHandle, 0);

So, my question is how can I do render-to-floating-point-texture on iPhone/iPad?

Alternatively is there another way to solve this problem? I thought one messy alternative would be to use floating point textures in a frag shader and then render the resulting 16bit half-float into a regular gl_FragColor in a fragment shader by simply storing its first 8 bits in the R and second 8 bits in the G of gl_FragColor. Then I could read these values back from the FB using glReadPixels and reinterpret them as half floats and transfer that data to a new texture and repeat the process. The obvious problem with this is that it requires a very wasteful round trip from the GPU back to the CPU and then back to the GPU and im pretty sure it wont give any speed improvement.

Does anyone have any ideas?

epatel
  • 45,805
  • 17
  • 110
  • 144
twerdster
  • 4,977
  • 3
  • 40
  • 70
  • 2
    Although half-float textures (and especially render-targets) are more likely to be supprted by iGPUs, they won't buy you anything in the context of a CG solver, I think, as I doubt that the precision of a half is sufficient for practical problems (where single precision might already be too less). I suppose you will need a really good preconditioner and everything beyond a Jacobi-preconditioner is much more tricky on the GPU than a matrix-vector product (at least if you want to beat CPU mode), especially when using fragment shaders, which are much more limited than CUDA/OpenCL. – Christian Rau Aug 27 '11 at 19:03
  • Luckily we will be using a Jacobi-preconditioner. I am going to try and use the method described here: http://msdn.microsoft.com/en-us/library/ee416413(v=vs.85).aspx. It looks like the best way to emulate a float. Obviously the penalty for doing the encode and decode will be heavy but it might still be worth it. And finally if that doesnt work I will use Neon primitives which will give a guaranteed 3x to 8x. But thats the most Il get from NEON. – twerdster Aug 27 '11 at 19:51
  • If you mean this RGBE shared exponent format, this is even worse. You only got around 9 bits of precision, and this only if the exponents are all the same, which they surely won't. With such extremely reduced precision formats (which might work for colors), you will surely run into problems when trying to solve real linear equation systems, especially when using a simple Jacobi preconditioner. – Christian Rau Aug 28 '11 at 11:27
  • 2
    Like the others said, I'm not at all certain that this is the route you want to take. The CPU in the iPhone and iPad is capable of gigaflop performance in single-precision with correct rounding. The iPad2 is capable of gigaflop performance in double-precision as well. If your goal is to solve "real" problems, you are likely better off staying on the CPU and using "real" IEEE-754 formats. – Stephen Canon Aug 28 '11 at 15:46
  • Thanks for the input. I will try both methods over the next days/weeks and will post my results as the answer to the question. – twerdster Aug 28 '11 at 19:50
  • @Christian The idea wont be to use 3 colors in RGBE format. Instead I will be using RGBA for one float i.e. 32 bits. – twerdster Aug 28 '11 at 20:23
  • @twerdster But you wont get the number decoded with perfect accuracy and even then you have to assume a common exponent for all numbers, which again means precision loss for the numbers where this exponent is quite off. Look at [this question](http://stackoverflow.com/q/7059962/743214), maybe you have a clever idea for decoding the number and can contribute to the question. But keep in mind that you cannot just interpret the RGBA8 bits as the bits of a float in a shader (at least not in ES, GL 3.0 hardware can do that, I think). – Christian Rau Aug 28 '11 at 22:39
  • I think I found the correct way to do it and I have posted a response to the OP on that question. It will form part of an answer to this question. – twerdster Aug 29 '11 at 23:52

1 Answers1

0

Look up the Accelerate library, the vDSP functions

Robert Diamond
  • 1,155
  • 1
  • 14
  • 12