0

I want to calculate the per-row minimum of a matrix of floats in GLSL in the browser, of about 1000 rows, 4000 columns.

Building on previous answers (see this) I used a for loop. However I would like to use a uniform for the upper bound, which is not possible in WebGL GLSL ES 1.0. This is because the length of the row is defined after the fragment shader, and I'd like to avoid messing with #DEFINEs.

So I found out that this workaround - fixed cycle length with a if/break defined by a uniform - works ok:

#define MAX_INT 65536
  
void main(void) {
  float m = 0.0;
  float k = -1.0;
  int   r = 40;

  for(int i = 0; i < MAX_INT; ++i){
    float ndx = floor(gl_FragCoord.y) * float(r) + float(i); 
    float a = getPoint(values, dimensions, ndx).x;
    m = m > a ? m : a;
    if (i >= r) { break; }
  };
}

Now the question: does this have big drawbacks? Is there something weird I am doing and I'm missing something?

simone
  • 4,667
  • 4
  • 25
  • 47
  • Depending on GPU architecture breaking (exiting) a loop does not free that core. It will wait until associated parallel cores have also exited their loops. ES3 (loops / condition exits) are also bound by this need. Its worth implementing early exits (again depending on architecture) .Idle cores use a fraction of the power that a busy core uses. Less power... less heat.. higher clock speed (if available) = increased overall throughput.This is abridged 600 char comment version of what is going on. The rule of thumb is longest executing core is the time 4 all par cores – Blindman67 Sep 23 '20 at 13:35

1 Answers1

0

I believe, but am not entirely sure, that the only risk is some driver/gpu will still make the long loop.

As an example imagine this loop

uniform int limit;

void main() {
  float sum = 0;
  for (int i = 0; i < 3; ++i) {
    sum += texture2D(tex, vec2(float(i) / 3, 0)).r;
    if (i >= limit) {
      break;
    }
  }
  gl_FragColor = vec4(sum);
}

that can be re-written by the driver like this

uniform int limit;

void main() {
  float sum = 0;
  for (int i = 0; i < 3; ++i) {
    float temp = texture2D(tex, vec2(float(i) / 3, 0)).r;
    sum += temp * step(float(i), float(limit));
  }
  gl_FragColor = vec4(sum);
}

no branches. I don't know if any such drivers/gpus still exist that have no conditionals but the idea of requiring a const integer expression for a loop is so the branches can be removed and/or the loop un-rolled at compile time if the driver/GPU decided to do either.

uniform int limit;

void main() {
  float sum = 0;
  sum += step(float(0), float(limit)) * texture2D(tex, vec2(float(0) / 3, 0)).r;
  sum += step(float(1), float(limit)) * texture2D(tex, vec2(float(1) / 3, 0)).r;
  sum += step(float(2), float(limit)) * texture2D(tex, vec2(float(2) / 3, 0)).r;
  gl_FragColor = vec4(sum);
}

Also, as an aside, the specific example you have above doesn't output anything so most drivers would turn the entire shader into a no-op.

gman
  • 100,619
  • 31
  • 269
  • 393