Performance in a rendering engine is largely dependent on how much control it has over what kind of stuff gets rendered. The less control the engine imposes, the less control the engine has on performance.
So when you say "Any VBO/VAO can be updated independently, so I can't convert this into one big VBO/VAO," that represents basically having no control over any aspect of your vertex input data. That lack of control directly translates to a lack of performance. Any control you can impose on the storage and layout of the meshes being rendered can be translated into improved performance.
Note: the following assumes you're using separate attribute format calls for manipulating your VAOs.
Ideally, all meshes of certain types (skinned, unskinned terrain, UI, etc) would use the same format for its vertex arrays and the same buffer objects for storage. So rendering a series of meshes would just be a single VAO bind, a call to glBindVertexBuffers
, and some number of glDrawElementsBaseVertex
or equivalent calls. The goal here is that, while you can allow some vertex formats and buffers, the amount of stuff that gets rendered does not affect how many binding calls get done.
The next step below that is to allow different meshes (or groups of meshes) to use different buffer objects. But they'd all still be sharing the same vertex format (aka: the state set by glVertexAttribFormat
). And even here, you want to try share buffer objects as much as possible. In this case, drawing a series of meshes that use the same vertex format begins with binding the VAO and iterate over the meshes. For each mesh, you call glBindVertexBuffers
then draw it. If several meshes use common buffers, then you should sort the draws by buffers.
If you aren't willing to at least control the format of your vertex data, then there's not much better that you can do, at least from this perspective.