1

I've been trying to learn Web GL using these awesome tutorials. My goal is to make a very simple 2D game framework to replace the canvas-based jawsJS.

I basically just want to be able to create a bunch of sprites and move them around, and then maybe some tiles later.

I put together a basic demo that does this, but I hit a performance problem that I can't track down. once I get to ~2000 or so sprites on screen, the frame rate tanks and I can't work out why. Compared to this demo of the pixi.js webgl framework, which starts losing frames at about ~30000 bunnies or so (on my machine), I'm a bit disappointed.

My demo (framework source) has 5002 sprites, two of which are moving, and the frame rate is in the toilet.

I've tried working through the pixi.js framework to try to work out what they do differently, but it's 500kloc and does so much more than mine that I can't work it out.

I found this answer that basically confirmed that what I'm doing is roughly right - my algorithm is pretty much the same as the one in the answer, but there must be more to it.

I have so far tried a few things - using just a single 'frame buffer' with a single shape defined which then gets translated 5000 times for each sprite. This did help the frame rate a little bit, but nothing close the the pixi demo (it then meant that all sprites had to be the same shape!). I cut out all of the matrix maths for anything that doesn't move, so it's not that either. It all seems to come down to the drawArrays() function - it's just going really slow for me, but only for my demo!

I've also tried removing all of the texture based stuff, replacing the fragment shader with a simple block colour for everything instead. It made virtually no difference so I eliminated dodgy texture handling as a culprit.

I'd really appreciate some help in tracking down what incredibly stupid thing I've done!

Edit: I'm definitely misunderstanding something key here. I stripped the whole thing right back to basics, changing the vertex and fragment shaders to super simple:

attribute vec2 a_position;

void main() {
    gl_Position = vec4(a_position, 0, 1);
}

and:

void main() {
    gl_FragColor = vec4(0,1,0,1);  // green
}

then set the sprites up to draw to (0,0), (1,1).

With 5000 sprites, it takes about 5 seconds to draw a single frame. What is going on here?

Community
  • 1
  • 1
MalphasWats
  • 3,255
  • 6
  • 34
  • 40
  • Update: I'm making progress but still have some work to do. Both the answers below have helped. By creating a SpriteBucket object that loads vertices from lots of sprites all at once and making 1 draw, I have increased the performance a lot. At first, 5000 sprites all loaded into one buffer was VERY slow, the slowest it's been actually, but by breaking the sprites up across multiple buckets in batches of ~500, I'm getting ~60fps again. I just need to put all the texture stuff back in now! – MalphasWats Jan 13 '15 at 10:21

2 Answers2

5

A look at a the frame calls using WebGLInspector or the experimental canvas inspector in chrome reveals a totally not optimized rendering loop.

You can and should use one and the same vertexbuffer to render all your geometry, this way you can save the bindBuffer aswell as the vertexAttribPointer calls. You can also save 99% of your texture binds as you're repetively rebinding one and the same texture. A texture remains bound as long as you do not bind something else to the same texture unit.

Having a state cache is helpful to avoid binding data that is already bound.

Take a look at my answer here about the gpu as a statemachine.

Once your rendering loop is optimized you can go ahead and consider the following things:

  • Use ANGLE_instanced_arrays extension
  • Avoid constructing data in your render loop.
  • Use an interlaced vertexbuffer.
  • In some cases not using an indexbuffer also increases performance.
  • Check if you can shave off a few GPU cycles in your shaders
  • Break up your objects into chunks and do view frustum culling on the CPU side.
LJᛃ
  • 7,655
  • 2
  • 24
  • 35
  • Thank you. This does mostly make sense (at the start of the week I had never done any GL programing!). I did (accidently!) have a version that just used a single buffer and translated the same 6 vertices for each object to draw and that was marginally better but still a long way from what I was hoping for. I think my problem is related though and I'm going to try loading as many vertices into one buffer as I can. I'm concerned that I can't work out how to just translate one object though - I'd really like to avoid having to calculate everything javascript side. – MalphasWats Jan 13 '15 at 08:27
  • I'm going to mark this one as the correct answer. I needed some pointers of where to look and this one gives a pretty good idea of where to start. I'm going to write my solution up a little more as well. – MalphasWats Jan 14 '15 at 09:25
  • Once you're done optimizing your renderloop you may apply @NoHarmInTryings approach. But be aware that the performance of this approach heavily depends on the amount of dynamic objects you have. It would be no performance win when you end up transforming all vertices every frame on the GPU. – LJᛃ Jan 14 '15 at 18:23
1

The problem is probably this line in render: glixl.context.uniformMatrix3fv(glixl.matrix, false, this.matrix);.

In my experience, passing uniforms for each model is very slow in webGL, and I was unable to maintain 60FPS after ~1,000 unique models. Unfortunately there is no uniform buffers in webgl to alleviate this problem.

I solved my problem by just calculating all the vertex positions on the CPU and draw them all using one drawArray call. This should work if the vertex count isnt overwhelming. I can draw 2k moving + rotating cubes at 60 FPS. I dont recall exactly how many cubes you can draw at 60 FPS but it is quite a bit higher than 2k. If that isnt fast enough then you have to look into drawArrayInstanced. Basically, store all the matrices on an arraybuffer and draw all your models using one drawArrayInstanced call with correct offset and such.

EDIT: also to the OP, if you want to see how PIXI does the vertex update rendering (NOT uniform instancing), see https://github.com/GoodBoyDigital/pixi.js/blob/master/src/pixi/renderers/webgl/utils/WebGLFastSpriteBatch.js.

No Harm In Trying
  • 483
  • 1
  • 6
  • 12
  • Thanks for the answer. If I simply comment out that line, nothing gets drawn (obviously), the frame rate goes up a couple of points, but not much. On the machine I'm using now, I get 8 fps with 5000 sprites with the `uniformMatrix3fv()` call, it goes up to ~10 if I comment it out. The pixi demo goes to ~8000 sprites before it starts dropping on this machine. – MalphasWats Jan 12 '15 at 20:56
  • I see, the bottle neck is else where then. I suggest take out the lines in your render function one by one and see which one has the biggest impact. It is likely to be one of those calls that changes the GPU state. BindTexture maybe. Do a CPU profile if you havnt yet to see if you are CPU bound. – No Harm In Trying Jan 12 '15 at 21:03
  • Uniform binding is certainly **not** *the problem*, its pretty much the only thing that is not part of the problem as [its the fastest thing](http://stackoverflow.com/questions/26564104/use-single-vertex-buffer-or-many/26568792#26568792) one can do be it webgl or not. – LJᛃ Jan 12 '15 at 23:44
  • While it is true that it is the fastest state changing operation, you are forgetting about the fact that it needs to be called many times (+ individual draw calls) if uniform binding is used to instance. Whereas if you do vertex updates, it is one bufferSubData call and one draw call. Although the problem for the OP right now is that he is probably doing too much state changes but this is one area he could optimize further should he need to. – No Harm In Trying Jan 12 '15 at 23:53
  • You answer advises it as the problem, which it imho is not. Following your answer the OP would transform each and every vertexbuffer on the CPU side and update it. This would just replace the uniform call with a `bufferData` call and effectively trash the GPU caching that is going on behind the curtains. In addition to that saying that "*you can draw 2k cubes at 60FPS*" does not really give anything as we do not know about your hardware or software or how many ms your **render**loop consumed. – LJᛃ Jan 13 '15 at 00:05
  • 1
    It could have been the problem had the OP not have other bottlenecks. The way I described is how PIXI does it: https://github.com/GoodBoyDigital/pixi.js/blob/master/src/pixi/renderers/webgl/utils/WebGLFastSpriteBatch.js. I am merely sharing my experience on facing this same problem and how I tackled it. There are very few up to date answers on issues regarding performance. – No Harm In Trying Jan 13 '15 at 00:10
  • Thank you, the link is helpful. I know I've had a couple of downvotes for my question because it's a little bit vague and probably comes across as lazy, but it's difficult to find examples that deal with how to work with multiple, different objects. I also have no idea what sort of performance I should expect. The Pixi link is helpful though - they load up a buffer with a whole batch of objects and draw them all at once. I'm going to try that and see what I get. It makes sense from a tile map type engine anyway, and for the most part I can probably do it once and translate it for position. – MalphasWats Jan 13 '15 at 08:11