14

I have successfully implemented a simple 2-d game using lwjgl (opengl) where objects fade away as they get further away from the player. This fading was initially implemented by computing distance to origin of each object from the player and using this to scale the objects alpha/opacity.

However when using larger objects, this approach appears a bit too rough. My solution was to implement alpha/opacity scaling for every pixel in the object. Not only would this look better, but it would also move computation time from CPU to GPU.

I figured I could implement it using an FBO and a temporary texture.
By drawing to the FBO and masking it with a precomputed distance map (a texture) using a special blend mode, I intended to achieve the effect. The algorithm is like so:

0) Initialize opengl and setup FBO
1) Render background to standard buffer
2) Switch to custom FBO and clear it
3) Render objects (to FBO)
4) Mask FBO using distance-texture
5) Switch to standard buffer
6) Render FBO temporary texture (to standard buffer)
7) Render hud elements

A bit of extra info:

  • The temporary texture has the same size as the window (and thus standard buffer)
  • Step 4 uses a special blend mode to achieve the desired effect:
    GL11.glBlendFunc( GL11.GL_ZERO, GL11.GL_SRC_ALPHA );
  • My temporary texture is created with min/mag filters: GL11.GL_NEAREST
  • The data is allocated using: org.lwjgl.BufferUtils.createByteBuffer(4 * width * height);
  • The texture is initialized using: GL11.glTexImage2D( GL11.GL_TEXTURE_2D, 0, GL11.GL_RGBA, width, height, 0, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, dataBuffer);
  • There are no GL errors in my code.

This does indeed achieve the desired results. However when I did a bit of performance testing I found that my FBO approach cripples performance. I tested by requesting 1000 successive renders and measuring the time. The results were as following:

In 512x512 resolution:

  • Normal: ~1.7s
  • FBO: ~2.5s
  • (FBO -step 6: ~1.7s)
  • (FBO -step 4: ~1.7s)

In 1680x1050 resolution:

  • Normal: ~1.7s
  • FBO: ~7s
  • (FBO -step 6: ~3.5s)
  • (FBO -step 4: ~6.0s)

As you can see, this scales really badly. To make it even worse, I'm intending to do a second pass of this type. The machine I tested on is supposed to be high end in terms of my target audience, so I can expect people to have far below 60 fps with this approach, which is hardly acceptable for a game this simple.

What can I do to salvage my performance?

Scarzzurs
  • 191
  • 8
  • How exactly should the effect look like? Fading out at the borders or something similar? – Gunther Piez May 06 '11 at 08:47
  • The distance map is currently implemented using `double dist = Math.hypot(nRadius-i,nRadius-j); double a = Math.max( 0, 1 - dist / nRadius );`. Where a*a is used as alpha. – Scarzzurs May 06 '11 at 08:53
  • Did you try to time your FBO method without step 4 ? i.e. (if I understood your method correctly) render to FBO and blit directly to screen without performing your fading effect. This would tell you if the problem lies with FBOs themselves or with the implementation of your effect. – Nicolas Lefebvre May 06 '11 at 09:02
  • 1
    Where i,j iterate over all pixels in the object? I can not answer why your FBO operation is slow, but I would implement it using multitexturing with you distance map as the second texture and maybe additionally scale alpha, if a certain distance is reached – Gunther Piez May 06 '11 at 09:11
  • @Bethor Updated post with these results too. It appears as if the rendering of the temporary texture is the biggest sinner here... – Scarzzurs May 06 '11 at 09:15
  • @drhirsch Sorry, new to Stackoverflow, so submitted half-finished post when I tried to create a new line, and then ran out of edit time... But yeah, I basically compute alpha to falloff when moving away from the center with fast initial falloff (quadratic). I'll read up on multitexturing and get back to you. – Scarzzurs May 06 '11 at 09:20
  • So if I'm not mistaken most of the overhead of your FBO version comes from either step 3 or step 6. How are you performing step 6 (transfering the FBO texture to the default buffer) ? In my limited FBO expertise, blitting an FBO to screen should be close to free, performance wise. – Nicolas Lefebvre May 06 '11 at 09:33
  • @Bethor In step 6 I simply draw the texture using Ortho(-1,+1,-1,+1, -1.0, +1.0) (projection matrix), the blend function glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA) (since I am using premultiplied alpha), glColor4d(1,1,1,1) and a simple quad in vertex array. Disabling blending in step 6 doesn't appear to change results... – Scarzzurs May 06 '11 at 10:53
  • I'm guessing this is all fixed-function pipeline then ? I thought the issue might have been with a shader being used to render the FBO's texture to the screen. Did you try simply blitting the FBO to further isolate your issue ? (i.e. a simple glBlitFrameBuffer instead of rendering to a quad) – Nicolas Lefebvre May 06 '11 at 11:03
  • @Berthor I'm not completely sure about the term fixed-function pipeline, but I am not using shaders, so I would say so... In the high-resolution version I get a performance cost of ~5.5s using blit, which is some improvement, although this is still higher than the ~3.5s I get from disabling step 6 entirely... – Scarzzurs May 06 '11 at 11:56
  • 4
    This is an example where the fragment shader is jumping up and down, shouting "please, please, please, use meeeeee!" :-) Although that's a serious change in the way you have to think and program, you should consider it nevertheless. Squared distance to gl_Fragcoord.xy from some point that you pass as uniform boils down to one vector subtract and one dot product. If you want linear attenuation, it's another recp and multiply, but still it's as easy as can be and will be super super fast (around a dozen cycles, no extra texture, no extra blending, no obscure blend modes). – Damon May 06 '11 at 12:26
  • 1
    I agree with Damon. Your FBO solution is clever, but ultimately it will always be slow compared to a shader-based approach. It's best to only use FBO's for rendering to textures that don't have to change every frame. – sidewinderguy May 06 '11 at 21:57
  • @Damon @sidewinderguy I see what you mean. I did however expect performance to be better using the built-in blend modes and a bit of extra textures. I guess FBO aren't really meant for this stuff... Right, so if I choose to go along the fragment shader path, wouldn't it still be faster to have a pre-rendered texture with the alpha values for lookup, rather than computing the same values over and over? – Scarzzurs May 07 '11 at 13:30

1 Answers1

5

As suggested by Damon and sidewinderguy I successfully implemented a similar solution using a fragment shader (and vertex shader). My performance is little bit better than my initial cpu-run object-based computation, which is MUCH faster than my FBO-approach. At the same time it provides visual results much closer to the FBO-approach (Overlapping objects behave a bit different).

For anyone interested the fragment shader basically transforms the gl_FragCoord.xy and does a texture lookup. I am not sure this gives the best performance, but with only 1 other texture activated I do not expect performance to increase by omitting the lookup and computing the texture value directly. Also, I now no longer have a performance bottleneck, so further optimizations should wait till it is found to be required.

Also, I am very grateful for the all the help, suggestions and comments I received :-)

Scarzzurs
  • 191
  • 8
  • Thanks for keeping us updated ! I've only ever used the programmable pipeline, it's interesting to know that on top of being more flexible, it can also be significantly faster than fixed function. – Nicolas Lefebvre May 11 '11 at 12:44