How to reduce OpenGL CPU usage and/or how to use OpenGL properly

Question

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:

1) Start a timer
2) Draw my shapes and text (about a thousand of them):

glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
    std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
    glVertex2f(coordinates.first, coordinates.second);
}   
glEnd();

and

glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
    glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();

3) Call

glFlush();

4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call

glutPostRedisplay();

The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.

The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?

if you do not want to use VBO instead of `glBegin/glEnd` then the `getOpenGlCoordinates` and `coordinates.first, coordinates.second` looks like function and class properties accessing which requires code. That should not be inside `glBegin/gllEnd` its much faster to pass raw array of data ... also `glVertex2fv` is faster than `glVertex2f` if raw data is passed ... — Spektre, Jul 15 '19 at 07:44

score 6 · Accepted Answer · answered May 21 '15 at 18:37

First and foremost the biggest CPU hog with OpenGL is immediate mode… and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.

The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).

glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.

Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.

Just as a point of clarification, I'm developing on Ubunutu, not Windows. I'll look into avoiding this "immediate mode" and see if that works. Thanks! — mackorone, May 21 '15 at 18:55
@mackorone: You already seem to have your vertices being organized in some array. The easiest way would be to tell OpenGL about that array (`glVertexPointer`; `glEnableClientState(GL_VERTEX_ARRAY)` and just draw the full contents of it with `glDrawArray`. — datenwolf, May 21 '15 at 18:59
Yes, my vertices are stored in a vector. So that means OpenGL can make sense of all of those vertices at once? I'm new to this double buffering so I'm still trying to figure out exactly what I need to do... — mackorone, May 21 '15 at 19:08
@mackorone: Double buffer is a wholly different thing. Essentially with double buffering you're first drawing to a "hidden" back buffer, and only once you're done tell the system to make that back buffer the visible front buffer (swap front and back buffer). Yes, OpenGL can "make sense" of all these vertices at once. In fact the whole `glVertex` calling business has been out-of-date since OpenGL-1.1 did introduce vertex array support almost 20 years ago. — datenwolf, May 21 '15 at 19:50

vandervinds · Answer 2 · 2019-07-15T22:17:27.440

I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%

#include <chrono>
#include <thread>

void render_loop(){
  ...
  auto const start_time = std::chrono::steady_clock::now();
  auto const wait_time = std::chrono::milliseconds{ 17 };
  auto next_time = start_time + wait_time;
  while(true){
    ...
    // execute once after thread wakes up every 17ms which is theoretically 60 frames per 
    // second
    auto then = std::chrono::high_resolution_clock::now();
    std::this_thread::sleep_until(next_time);

    ...rendering jobs

    auto elasped_time = 
    std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
    std::cout << "ms: " << elasped_time.count() << '\n';
    next_time += wait_time;
  }
}

I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough

Inspired by this post

While std::chrono::milliseconds is good, I found better precision using ::nanoseconds (and make sure you are using 64 bit type, like doubles). It is noticeable. However, not on all systems will you find such a precise clock. If one is available, use it, else it will just default to standard silently. That being said, waiting can be done via nanosleep. — user2262111, Jun 17 '20 at 17:05

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

2 Answers2