1

I'm looking for a way to render many meshes at once, so that I don't have to issue a draw call for each mesh. I'm dealing with a 2D rendering here, and a typical object such as a square may have only two triangles in it. However, an object may also be quite complex and have thousands of triangles.

Now each object can move around by itself. Conceptually it's perfectly reasonable to have a VBO (or VBO/IBO pair) for each "object": As long as the object does not change, all I have to upload to the GPU each frame is the transformation information: a position vector and an orientation value. Or, equivalently, a transformation matrix.

But the problem with that approach is with a scene of 1000 square objects I'd have 1000 VBO's and 1000 IBO's to initialize, and 1000 draw calls setting 1000 sets of uniforms each frame in order to render 2000 triangles.

Okay. If all of those objects are identical, I can have one VBO/IBO to describe them, set up a Uniform Buffer Object (or perhaps a uniform array is more appropriate -- I still need to learn how to use these) with transformation data for each of them, and issue one instancing draw call, to have the vertex shader pull from the UBO the transformation data by using the instance number it receives. Great.

I just want to go one step further. I want to do what amounts to instancing on non-identical meshes: I have 1000 different objects, which I am happy to describe in either 1000 separate vertex/index buffer pairs, or one single gigantic pair of vertex/index buffers. I want to send their tranformation data to the GPU in one call. It's simply a matter of letting the driver/GPU bind or select the proper vertices.

Can this be done? Can it be done without using SM4 geometry shaders?

Update: I just thought of a potential method to accomplish this. I use a vertex attribute as my "instancing" value with which to index into a UBO that contains transformations. Is this the way to do it?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Steven Lu
  • 41,389
  • 58
  • 210
  • 364
  • 1
    This smells of a premature optimization. What makes you think that you *need* to do this, that you can't just render these "1000 objects" the normal way? – Nicol Bolas Mar 03 '12 at 02:32
  • I likely don't *need* it. But I am interested in learning how to use the technology in the best way for my purposes. I want to be able to have a very large number of objects and if I can draw them all with one call I get more than just the benefit of avoiding function call overhead, I also get viewport culling, and i'm sure some other things, for free. – Steven Lu Mar 03 '12 at 03:27
  • How does that give you viewport culling? Instancing doesn't *cull* anything; it can't. Indeed, the whole point of instancing is to cut down on CPU overhead, so you try to do as little processing per-instance as possible. Even a quick frustum culling should generall be avoided. – Nicol Bolas Mar 03 '12 at 03:39
  • You are right about that. I think i meant that the naive method would allow me to cull manually, which is actually a pro rather than a con. No idea why I mentioned it there. Okay -- There is definitely a big benefit to having a vertex buffer (pair) for each object. This way, if I delete an object, I can delete its buffers, no more housekeeping necessary. So, what I'm looking for now is a way to issue buffer bind commands without an explicit call. Not possible? – Steven Lu Mar 03 '12 at 03:44
  • That doesn't explain why you feel that drawing "1000 objects" the normal way will be a substantial performance issue. Have you tried it? Is it a bottleneck? How big are these objects anyway? What ways have you tried to render things? – Nicol Bolas Mar 03 '12 at 07:12
  • No i haven't tried it yet. My guess is this is the sort of thing that might save save me a few hundred microseconds per frame at best -- not something to lose sleep over. But my interpretation of your response with respect to my original question of "is it possible to batch multiple VBOs" is, no, this is not possible. In that case I think the proper method to deal with this if I decide I want to try to deal with it, is probably to start assembling the smaller meshes into one VBO and use uniform arrays or buffers to specify their transformations. – Steven Lu Mar 03 '12 at 10:41

1 Answers1

0

You don't need one VBO per object. Just concatenate all the objects into one single VBO or just a small set of VBOs. You can address into that VBO either adding a offset to the data parameter of the gl*Pointer functions, or use glDrawElementsBaseVertex to add the offset upon drawing time. Instead of just 4 indecies in the index array concatenate the index-arrays of all the small objects. If you're using some strip or fan primitive, you can set a special index with glPrimitiveRestartIndex that when encountered will start a new primitive.

That way you need to split down your rendering calls only by used material settings and the overall transformations and shader parameters (i.e. textures, shaders, shader uniforms).

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • If you have a lot of objects, you would end up doing a lot of `glDrawArrays`/`glDrawElements` calls -- one per object. In my experience, having more than a few hundred per frame becomes very slow, even without changing uniform variables or textures and shader programs. I'm wondering if there is something in OpenGL functionality that allows you to draw many elements at once that have entirely different meshes. – axel22 Aug 03 '13 at 11:11
  • @axel22: If everything is done right, the actual bottlenecks when rendering are 1st *Fillrate*, i.e. the total memory bandwidth consumed by drawing operations toward the framebuffer and the texture fetches from memory. And 2nd is *Triangle Count*. *glDraw…* calls come with a certain overhead so the larger the rendering batch triggered by a single *glDraw…* call the better. You will have to profile your specific program, but a good number for a batch size are about between 0.5k to 2.5k triangles. – datenwolf Aug 03 '13 at 12:49
  • In my case the number of triangles per mesh is roughly 1200, and I render around 1500-2500 such meshes, depending on the scene. I found that if I copy all of these meshes into one huge VBO once and call `glDrawArrays` on every frame, the animation is smooth with FPS above 60. When I render each mesh separately by calling `glDraw*`, the framerate drops to cca 5-15 FPS (on a GTX 650 Ti). It was my conclusion that I should call `glDraw*` as least as possible. My problem is that whenever the scene changes, I have to copy everything to one big VBO, and that takes more than 100ms, causing a glitch. – axel22 Aug 03 '13 at 15:41
  • @axel22: What exactly are the changes that require to alter the data in the VBOs? This can be another, tight bottleneck. – datenwolf Aug 03 '13 at 16:16
  • In the case of one big VBO, I simply copy the vertex data of all the mesh instances (position, normals and tex coordinates, where I translate the position of each vertex). I do this for cca 1500 instances, where each mesh has cca 1200 triangles (taking cca 100ms). In the case of multiple `glDraw*` calls, the changes that alter the data would be binding a different uniform (e.g. a transformation matrix) for each instance. I haven't actually even done this yet -- I just called `glDraw*` on a single mesh with the same shader program #instances times, for testing reasons. – axel22 Aug 03 '13 at 16:32