2

I'm looking at a lot of OpenGL exploiting render code that converts COLORREF to float[3] or float[4] by eg. rgb[0] = ((float)GetRValue(col)) / 255.0f; and this is ringing performance alarm bells with me. The 255.0f can clearly be an int but why not use faster division by 256?

Our chief graphics programmer took exception to this suggestion because red = 255 going to anything but 1.0f is "morally wrong" but what are the actual consequences of slightly lower float values? Anything actually discernible?

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
Ian Bell
  • 29
  • 3
  • Interesting read on the topic: http://kaba.hilvi.org/homepage/blog/range/RangeConversion.pdf – datenwolf Apr 04 '19 at 11:44
  • 2
    "The 255.0f can clearly be an int" no, unless you want to convert all values to 0, except 255. " why not use faster division by 256?" you need to make a float division here, so why do you expect 256 to be faster? Last, do you think color int=>float conversion is a bottleneck when rendering 3D scenes using OpenGL? – L.C. Apr 04 '19 at 11:58
  • 1
    float/int is float so 100.0f/256 = .3906f , not 0. I expect float/256 to be faster than float/255 because the optimizer can spot exponent reduction division. – Ian Bell Apr 04 '19 at 13:23
  • "Bottleneck" is hard to define, but as this code is single-threaded and on the CPU, wasting time on pointlessly precise math seems , well, pointless. – Ian Bell Apr 04 '19 at 13:35
  • You might want to read [Fast multiplication/division by 2 for floats and doubles (C/C++)](https://stackoverflow.com/questions/7720668/fast-multiplication-division-by-2-for-floats-and-doubles-c-c) – BDL Apr 06 '19 at 08:56

1 Answers1

3

why not use faster division by 256?

Integer division by 256 would be faster, but you're not doing integer division, are you? Floating-point division doesn't generally have shortcuts based on specific values like that. If you have to do a floating-point division, you pay the cost of a floating-point division.

This Godbolt example shows that Clang, on high optimizations, doesn't find a way to make division by 256.0f any faster than 255. GCC and MSVC do get one small optimization: they substitute the division by a multiplication against the compile-time computed value of 1/256.0f, which is an exact floating-point number. But you can accomplish the same thing by having your code explicitly multiply by (1 / 255.0f), so again, there is no advantage.

Now, there are theoretically faster ways to normalize an unsigned integer value into a float. But these generally rely on direct bit manipulation of a floating-point value, and may not actually be faster than just doing the floating-point math. You'd have to profile it under the specific circumstances you intend to use them under to make it work.


what are the actual consequences of slightly lower float values? Anything actually discernible?

The consequences could be anything. As far as modern OpenGL is concerned, the meaning of every value you provide to OpenGL is determined entirely by your code. Indeed, your code could add a division by 0.996 to rescale the number and thus there would be no real difference.

It is easy to write a piece of shader code that will break if you refuse to pass a properly normalized floating-point value (anything that does if(value == 1.0f), which is required to be true if you did normalization correctly). I can just as easily write code that wouldn't care. There is no general answer; it all depends on what you're doing with it.

Compatibility OpenGL is essentially the same way: it all depends on what you're doing with it. It may be fixed-function rather than shaders, but there's still plenty of room in there for you to define the meaning of that number.

Because the viability of the result is based on information which you cannot know at the level of a simple normalizeIntegerToFloat function, what you ought to do is provide the caller of the function with a choice. Provide an accurate version and a lossy version. What you absolutely should not do is make the lossy version the default. If the user of the function sees a performance problem, they can switch to using the lossy version to help alleviate it.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 1
    I am dividing a float by a power of 2 integer value-known at compile time and am entitled to expect the optimizer to optimize such; eg with float u8tofloat_trick(uint8_t x) { union { float f; uint32_t i; } u; u.f = 32768.0f; u.i |= x; return u.f - 32768.0f; } which is a float/256 twice as fast on amd64 as float/255, when such optimisations are available. I do not have to profile FP bitmanipulations to create the optimal code; that is the compiler's job. My job is to use the computations most ameniable to optimization. – Ian Bell Apr 04 '19 at 18:58
  • @IanBell: You're entitled to expect whatever you want; if the compiler isn't optimizing the code, then it's not optimizing the code, and wanting it to will not cause it to happen. And if you're going to talk about bit manipulation tricks, there are tricks to shove the 8-bits of your integer into the mantissa of a 32-bit IEEE 754 float without doing actual floating-point division. So technically, you don't even need the division. – Nicol Bolas Apr 04 '19 at 19:06
  • There is a regreatable tendency on this board to favour challenging the premises of a question rather than actually answering it. I am here asking an Open GL question about possible adverse consequences of mapping 0-255 to [0,996f] rather than [0, 1.0]. Does anyone know of any? – Ian Bell Apr 04 '19 at 19:08
  • @IanBell: "*There is a regreatable tendency on this board to favour challenging the premises of a question rather than actually answering it.*" Well... yes. Your question only exists because of a misconception you have about the difference between two pieces of code. If the misconception is resolved, then the question disappears. As for the consequences of that... they could be anything, because *you decide* what those numbers mean in your shaders. – Nicol Bolas Apr 04 '19 at 19:10
  • @NicoBolas: The optimizer is free to spot that ((float)(unit8))/256 can be done with direct bit manipulation. If I use such tricks directly (which would be foolhardy and assumptive of the platform) then I will still achieve a [0,0.996] range so my actual question stands. – Ian Bell Apr 04 '19 at 19:15
  • @IanBell: "*If I use such tricks directly (which would be foolhardy and assumptive of the platform)*" That is the opposite of how optimization gets done. If you care enough to even be asking the question as to whether your normalization code is fast enough, then you care enough to *make it fast*. And if you find that the compiler isn't doing a good enough job, then you care enough to actually do it yourself. Which means researching the platforms of interest and writing conditions based on them to produce optimal results. Otherwise, this is just premature optimization. – Nicol Bolas Apr 04 '19 at 19:18
  • @IanBell: BTW, I edited my answer to speak directly about the consequences of this. – Nicol Bolas Apr 04 '19 at 19:19
  • @NicholBolas. " If you care enough to even be asking the question as to whether your normalization code is fast enough, then you care enough to make it fast. And if you find that the compiler isn't doing a good enough job, then you care enough to actually do it yourself. Which means researching the platforms of interest and writing conditions based on them to produce optimal results. " Dangerous nonsence for production code IMV.I am writing C++, not Assembly and am entitled to expect the optimizer to implement say float /=2 faster than float /= 17 when the platform supports such. – Ian Bell Apr 04 '19 at 19:32
  • @IanBell: "*Dangerous nonsence for production code IMV*" How exactly is it dangerous to accept reality? If you need those optimizations, and your compiler doesn't give them to you, then *you must implement them*. And if you don't need those optimizations... why do you care if they happen or not? Optimizing things that don't need optimization is a waste of time. And when writing "production code," *time* is the most important thing. Because you never have enough of it, so it must be efficiently spent. – Nicol Bolas Apr 04 '19 at 19:39
  • @NicholBolas. It is dangerous nonsence to add platform based conditionals and cryptic pragma malarky to production sources in most circumstances; and if such is necessary merely to get your compiler to divide by powers of two efficienctly at compile time you need a new vendor. It depends what code you are producing whether time is the most important thing. If you are rushing trash for games industry maybe. In other areas robustness, consistency, and maintainability are more important. "Need" is not a Boolean. Optimisations have pros and cons detrmining viability ; – Ian Bell Apr 05 '19 at 10:56
  • 1
    @IanBell: "*It depends what code you are producing whether time is the most important thing.*" And if it's not the most important thing, why are you pursuing this? Listen to your chief graphics programmer; you should not fundamentally alter the nature of a mathematical operation unless it buys you something *important*. And for what it's worth, even those of us "rushing trash for games industry" ***don't do this***. – Nicol Bolas Apr 05 '19 at 13:21
  • @NicholBolas Changing a /255 to /256 is not "fundamentally altering the nature of a mathematical operation" ; *unless* I implement it in a fundamentally different way such as platform conditional pseudo-assembly rather than / I am pursuing this because it strikes me as a safe, simple optimization with no downsides. I'm not sure what "properly normalized floating point number" means. Any shader contaiing if(red==1.0) is clearly suspect, and how will it "break" if I pass in /256 rather than /255? All that will happen is eg. RGB(FF,00,00) will start being treated more as if it was (FE,00,00). – Ian Bell Apr 08 '19 at 06:40
  • 1
    @IanBell: "*Any shader contaiing if(red==1.0) is clearly suspect*" Why is it suspect? Who decides what is and is not "suspect"? Shaders do math; what they do internally isn't your business; your job is to provide it the correct data. Tests against 1.0, for a value that is fetched from a normalized texture, is not unreasonable. You sometimes see it when looking at the alpha value of a color, for a more concrete example. The difference between completely opaque and not opaque is pretty significant. – Nicol Bolas Apr 08 '19 at 13:28
  • 1
    @IanBell: "*it strikes me as a safe, simple optimization with no downsides*" I've shown that it does not actually make the code faster, so it is not a "optimization", "safe, simple" or otherwise. And whether it has "no downsides" depends *entirely* on how the value gets *used*, which may not be under your control. So your statement is only true if you consider something an optimization that doesn't improve performance and you define any code that breaks as "suspect". But if all you wanted was to reinforce your own beliefs, why bother asking the question? – Nicol Bolas Apr 08 '19 at 13:31
  • @NicholasBolas Float == is always suspect. It only makes sense in the context of particular numeric values being used as flags which is itself iffy when the flag value is in the expected value range. What is (red==1.0) supposed to be detecting? A clamped value? A value sourced from an uint8 red==0xFF? – Ian Bell Apr 09 '19 at 15:05
  • @NicholaBola " I've shown that it does not actually make the code faster," No you haven't. I've not told you my compiler so how can you possibly have demonstrated that it doesn't optimize float/256 faster than float/255? – Ian Bell Apr 09 '19 at 15:07
  • @IanBell: "*What is (red==1.0) supposed to be detecting?*" That's the problem you have when you ask such a broad, general question: it all depends on ***specifics***. Remember: *you* were the one who put `red` there; my example used a generic `value`. Why? Because you asked a *generic* question. Code could be doing anything, for any reason. And there are legitimate reasons to check to see if a value that you expected to be on the range [0, 1] is 1.0. As I suggested, testing the alpha value is one such case; you might do things differently if the alpha is partially transparent vs. fully opaque. – Nicol Bolas Apr 09 '19 at 15:08
  • @IanBell: "*I've not told you my compiler so how can you possibly have demonstrated that it doesn't optimize float/256 faster than float/255?*" OK, I've provided substantial evidence that compilers don't make that faster. And you can use Godbolt to find your compiler and prove otherwise. Optimization decisions should be based on actual evidence, not gut feelings about what "should" be faster or what might happen in the future or whatever. To do otherwise is just a waste of everyone's time. Your chief graphics programmer is a "chief" for a reason; why don't you just listen to them? – Nicol Bolas Apr 09 '19 at 15:13
  • @NicholBolas "you were the one who put red there; my example used a generic value. Why? Because you asked a generic question." Nonsence. My question was: "Why divide by 255 not faster 256 when converting OpenGL color gun uint8 to float? Is 1.0f rather than 0.996f for 255 crucial?"" which clearly relates specifically to OpenGL color gun uint8s. – Ian Bell Apr 09 '19 at 16:15
  • @NicholBolas Your Godbolt example was dividing by 256.0f rather than 256. Godbolt suggests that float/255 is by idiv while /256 is sar 8 on MSVC x64. – Ian Bell Apr 09 '19 at 16:53
  • @IanBell: "*Godbolt suggests that float/255 is by idiv while /256 is sar 8 on MSVC x64.*" Can you provide a link for that? When [I divide by 256, I see it using `mulss`](https://gcc.godbolt.org/z/d5Kw9q). Also, what is a "color gun uint8"? I know what a color is and what a uint8 is, but where does the "gun" part come from? – Nicol Bolas Apr 09 '19 at 17:01
  • @NicholBoras And as to cheking alpha == 1.0, if that is what you do you need more code reviews. Because with ==, if you divide your "fully opaque" alpha by three, then multiply by three again, you won't be "fully opaque" anymore. – Ian Bell Apr 09 '19 at 17:18
  • @IanBell: Which is why you won't see people doing that division and multiplication before the test. You test the values you're given, which are based on exact numbers, not the result of some computation. Also, where is your Godbolt link? I'd really like to see how you achieved that. – Nicol Bolas Apr 09 '19 at 17:29
  • @NicholBolas No I was mistaken. Godbolt does not suggest (float)/256 is shifted. "color gun" harks back to electron-control-grids. The days before flat screen. – Ian Bell Apr 09 '19 at 17:32
  • 1
    @NicholBolas. "You test the values you're given, which are based on exact numbers, not the result of some computation." That's assumptive. – Ian Bell Apr 09 '19 at 17:35