Tiled deferred shading without compute shader

Question

I'm building a deferred renderer and since I want to support a large amount of lights in the scene I've had a look at tiled deferred shading.

The problem is that I have to target OpenGL 3.3 hardware and it doesn't support GLSL compute shaders.

Is there a possibility to implement tiled deferred shading with normal shaders?

I don't think so. Tilled deferred rendering requires you to communicate between shader invocations in the same workgroup (e.g. need to store a common visible light list), which is not possible without compute shader. — BDL, Jul 16 '16 at 08:50
Mmmm...ok. Any suggestion then? Another approach I was thinking is light volumes, but I don't think it scales well with a lot of lights. — zeb, Jul 16 '16 at 09:17

score 2 · Answer 1 · answered Jul 16 '16 at 14:21

Tiled deferred rendering does not strictly require a compute shader. What it requires is that, for each tile, you have a series of lights which it will process. A compute shader is merely one way to accomplish that.

An alternative is to build the light lists for each frustum on the CPU, then upload that data to the GPU for its eventual use. Obviously it requires much more memory work than the CS version. But it's probably not that expensive, and it allows you to easily play with tile sizes to find the most optimal. More tiles means more CPU work and more data to be uploaded, but fewer lights-per-tile (generally speaking) and more efficient processing.

One way to do that for GL 3.3-class hardware is to make each tile a separate quad. The quad will be given, as part of its per-vertex parameters, the starting index for its part of the total light list and the total number of lights for that tile to process. The idea being that there is a globally-accessible array, and each tile has a contiguous region of this array that it will process.

This array could be the actual lights themselves, or it could be indices into a second (much smaller) array of lights. You'll have to measure the difference to tell if it's worthwhile to have the additional indirection in the access.

The primary array should probably be a buffer texture, since it can get quite large, depending on the number of lights and tiles. If you go with the indirect route, then the array of actual light data will likely fit into a uniform block. But in either case, you're going to need to employ buffer streaming techniques when uploading to it.

Tiled deferred shading without compute shader

1 Answers1