OpenGL performance for 10,000 static cubes

Question

I'm running the following Scala code. It compiles a single display list of 10,000 cubes. Then it displays them in the display loop with an animator that runs as fast as it can. But the FPS is only around 20. I had thought that using display lists would be able to handle this very quickly. I have a situation where I need to be able to display 10k-100k's of objects. Is there a better way to do so? In the display loop, pretty much all it does is call gluLookAt and glCallList (it's the last method).

I'm using JOGL 2.0-rc5 from jogamp.org which says it supports "OpenGL 1.3 - 3.0, 3.1 - 3.3, ≥ 4.0, ES 1.x and ES 2.x + nearly all vendor extensions"

class LotsOfCubes extends GLEventListener {
  def show() = {
    val glp = GLProfile.getDefault();
    val caps = new GLCapabilities(glp);
    val canvas = new GLCanvas(caps);
    canvas.addGLEventListener(this);

    val frame = new JFrame("AWT Window Test");
    frame.setSize(300, 300);
    frame.add(canvas);
    frame.setVisible(true);
  }

  override def init(drawable: GLAutoDrawable) {
    val gl = drawable.getGL().getGL2()
    gl.glEnable(GL.GL_DEPTH_TEST)

    gl.glNewList(21, GL2.GL_COMPILE)
    var i = -10.0f
    var j = -10.0f
    while (i < 10.0f) {
      while (j < 10.0f) {
        drawItem(gl, i, j, 0.0f, 0.08f)
        j += 0.1f
      }
      i += 0.1f
      j = -10f
    }
    gl.glEndList()

    val an = new Animator(drawable);
    drawable.setAnimator(an);
    an.setUpdateFPSFrames(100, System.out)
    an.start();
  }

  override def dispose(drawable: GLAutoDrawable) {
  }

  override def reshape(drawable: GLAutoDrawable, x: Int, y: Int, width: Int, height: Int) {
    val gl = drawable.getGL().getGL2();
    val glu = new GLU
    gl.glMatrixMode(GLMatrixFunc.GL_PROJECTION);
    gl.glLoadIdentity();
    glu.gluPerspective(10, 1, -1, 100);
    gl.glViewport(0, 0, width, height);
    gl.glMatrixMode(GLMatrixFunc.GL_MODELVIEW);
  }

  def drawBox(gl: GL2, size: Float) {
    import Global._
    gl.glBegin(GL2.GL_QUADS);
    for (i <- 5 until -1 by -1) {
      gl.glNormal3fv(boxNormals(i), 0);
      val c = colors(i);
      gl.glColor3f(c(0), c(1), c(2))
      var vt: Array[Float] = boxVertices(boxFaces(i)(0))
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
      vt = boxVertices(boxFaces(i)(1));
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
      vt = boxVertices(boxFaces(i)(2));
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
      vt = boxVertices(boxFaces(i)(3));
      gl.glVertex3f(vt(0) * size, vt(1) * size, vt(2) * size);
    }
    gl.glEnd();
  }

  def drawItem(gl: GL2, x: Float, y: Float, z: Float, size: Float) {
    gl.glPushMatrix()
    gl.glTranslatef(x, y, z);
    gl.glRotatef(0.0f, 0.0f, 1.0f, 0.0f); // Rotate The cube around the Y axis
    gl.glRotatef(0.0f, 1.0f, 1.0f, 1.0f);
    drawBox(gl, size);
    gl.glPopMatrix()
  }

  override def display(drawable: GLAutoDrawable) {
    val gl = drawable.getGL().getGL2()
    val glu = new GLU
    gl.glClear(GL.GL_COLOR_BUFFER_BIT | GL.GL_DEPTH_BUFFER_BIT)
    gl.glLoadIdentity()
    glu.gluLookAt(0.0, 0.0, -100.0f,
      0.0f, 0.0f, 0.0f,
      0.0f, 1.0f, 0.0f)
    gl.glCallList(21)
  }
}

what hardware are you using ? Is double "buffering" enabled ? — dan_l, Mar 27 '12 at 04:44
I added `caps.setDoubleBuffered(true)` and it didn't affect performance. As for hardware, I have a mid-range nvidia graphics card from a year or two ago. CPUs are 2 dual-core opterons from years ago. — mentics, Mar 27 '12 at 04:52
Second, please specify the OpenGL version you use. Does `GL2` indicate OpenGL 2? _Oh_, this is [JOGL](http://jogamp.org/jogl/www/), and [GL2](http://download.java.net/media/jogl/jogl-2.x-docs/javax/media/opengl/GL2.html) means this is OpenGL *3*. Searching for _scala GL2_ didn't result in much hits... — Stefan Hanke, Mar 27 '12 at 04:56
Note: When you use `glNewLists`, you're supposed to provide it a display list returned from `glGenLists`. You don't really *have* to, but it's common courtesy to allocate what you want. — Nicol Bolas, Mar 27 '12 at 05:21
You might want to replace to for comprehension in `drawBox` with a while loop. drawBox seems to be called very often and for comprehensions are not that performant. — drexin, Mar 27 '12 at 06:33
You're right, for comprehensions in Scala are slow. However, drawBox is only called during the creation of the display list. So, it shouldn't affect the FPS at all. — mentics, Mar 27 '12 at 06:41

score 10 · Accepted Answer · answered Mar 27 '12 at 04:43

10

You may want to think about using a Vertex Buffer, which is a way to store drawing information for faster rendering.

See here for an overview:

http://www.opengl.org/wiki/Vertex_Buffer_Object

answered Mar 27 '12 at 04:43

prelic

4,450
4
36
46

Why does it talk about that page having deprecated stuff on it? Are VBO's deprecated, or what is on that page? It's confusing. – mentics Mar 27 '12 at 05:10
1

@taotree: It's calling about the `glVertexPointer`, `glTexCoordPointer` and other stuff. That's been removed. Buffer objects are still there. I haven't gotten around to cleaning up that page. – Nicol Bolas Mar 27 '12 at 05:15
1

I tried this example that uses VBO's: http://wadeawalker.wordpress.com/2010/10/17/tutorial-faster-rendering-with-vertex-buffer-objects/ and it was able to do 1 million simple shapes at about 28 fps. – mentics Mar 27 '12 at 06:27

score 4 · Answer 2 · answered Mar 27 '12 at 05:08

4

If you store the vertex information in a vertex buffer object, then upload it to OpenGL, you will probably see a great increase in performance, particularly if you are drawing static objects. This is because the vertex data stays on the graphics card, rather than fetching it from the CPU every time.

answered Mar 27 '12 at 05:08

newprogrammer

2,514
2
28
46

I thought display lists store the data on the graphics card. – mentics Mar 27 '12 at 05:10

score 1 · Answer 3 · answered Jun 13 '12 at 12:02

You create a display list in which you call drawItem for each cube. Inside drawItem for each cube you push and pop the current transformation matrix and inbetween rotate and scale the cube to place it correctly. In principle that could be performant since the transformations on the cube coordinates could be precomputed and hence optimized by the driver. When I tried to do the same (display lots of cubes like in minecraft) but without rotation, i.e. I only used glPush/glPopMatrix() and glTranslate3f() , I realized that actually these optimizations, i.e. getting rid of the unneccessary matrix pushes/pops and applications, were NOT done by my driver. So for about 10-20K cubes I only got around 40fps and for 200K cubes only about 6-7 fps. Then, I tried to do the translations manually, i.e. I added the respective offset vectors to the vertices of my cubes directly, i.e. inside the display list there was no matrix push/pop and no glTranslatef anymore, I got a huge speed up, so my code ran about 70 times as fast.

OpenGL performance for 10,000 static cubes

3 Answers3