fast rounding fixed point number

Question

Let's say I treat my integer as if it has 4 fraction bits. And now 0 is zero, 16 is one, 32 is two, and it goes on. When rounding, numbers in range [-7, 7] becomes 0, [8, 23] becomes 16.

My code is this:

std::int64_t my_round(std::int64_t n) {
    auto q = n / 16;
    auto r = n % 16;
    if (r >= 0) {
        if (r >= 8) {
            ++q;
        }
    } else {
        if (r <= -8) {
            --q;
        }
    }
    return q * 16;
}

It's a lot of code for such a simple task. I wonder if there is a faster way to do it. I only need to support 64bit signed integer.

Edit: There was a comment (I don't remember whom made that comment) suggested adding 15 and masking the lower bits. It didn't work. But with some trial and error, I come up with this.

std::int64_t my_round2(std::int64_t n) {
    if (n >= 0) {
        n += 8;
    }
    else {
        n += 7;
    }
  return n & (~15ll);
}

I have no idea. but my_round2 seems to give the same result as my_round and is 20 times faster. If there is a way to remove the branch it would be much better.

You might want to check the assembly code that gets generated for this function when you compile with optimizations enabled; since you are using 8 and 16 which are powers-of-2, it's likely that the compiler is able to replace the integer-math operations with bitwise operations and that therefore the resulting executable is already quite efficient. If so, then rewriting the function to be shorter likely won't gain you any runtime efficiency, but it would make the function harder for a human reader to understand. — Jeremy Friesner, May 07 '21 at 23:36
`n += 8;` in `my_round2()` is subject to overflow. Wei Hsieh, how do you want to handle overflow? — chux - Reinstate Monica, May 08 '21 at 00:01
@chux-ReinstateMonica I found out `my_round` overflows too. Those two function still give the same result even when overflow happened. Luckily the overflow is not likely to happen. And int64_t won't be a good choice when working with values that large. — W.H, May 08 '21 at 00:15
@chux-ReinstateMonica -16. Actually it's easier to think about if we scale the fixed point number to real number, that is by dividing 16. so -15 becomes -0.9375. that would round to -1 which is -16 in my fixed point representation. — W.H, May 08 '21 at 03:29
@WeiHsieh The rounding mode employed here rounds half-way cases away from zero. This differs from the common FP "round to nearest, ties to even". I suspect you want speed more than ["reduce bias"](https://blogs.sas.com/content/iml/2019/11/11/round-to-even.html) practice, but the quality of your algorithm may improve with a different rounding. — chux - Reinstate Monica, May 08 '21 at 11:56
@WeiHsieh "I wonder if there is a faster way to do it." risks [premature optimization](https://softwareengineering.stackexchange.com/questions/80084/is-premature-optimization-really-the-root-of-all-evil). With your `my_round2()` the next steps are processor dependent, what works well on a 64-bit machine may perform horribly on a 16-bit embedded processor. Perhaps tag with the processor family of interest? — chux - Reinstate Monica, May 08 '21 at 12:12

score 4 · Accepted Answer · answered May 08 '21 at 08:51

4

With

return (n + 8 + (n>>63)) & (~15ll);

one can shave off the branch from my_round2(), and ensure the original symmetry at zero. The idea is that signed type >> (sizeof(signed type) * 8 - 1) is -1 for negative values and 0 for positive values.

Clang is able to produce branchless code for the original my_round2() but that is still one instruction longer than the proposed routine here. With arm64 the savings are even greater.

answered May 08 '21 at 08:51

Aki Suihkonen

19,144
1
36
57

3

Nice. An alternative is to replace `+(n>>63)` with `-(n<0)` which to me is clearer, but which may also depend more on compiler optimization to produce fast code. – nielsen May 08 '21 at 09:30
1

"signed type >> (sizeof(signed type) * 8 - 1) is -1 for negative values" is _implementation-defined_. That is also an example of IDB in the C spec: "An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right." `-(n<0)` (@nielsen) is a good alternative. – chux - Reinstate Monica May 08 '21 at 11:40
Yes, in two's complement and all boiler plate + of course there are systems with 9 bit chars. -- except, that std::int64_t AFAIK is guaranteed to be two's complement. (https://stackoverflow.com/q/13604137/1716339) – Aki Suihkonen May 08 '21 at 11:54
1

`int64_t` must be two's complement and unpadded, yet code still has implementation-defined behavior. Yes certainly common for the the IDB to behave as hoped but it not required - even if encoding is 2's complement. The concern is that compilers of tomorrow may take advantage of the spec is ways not anticipated today. Much code of days gone by coded to common practice fail today for like reasons. Your choice. – chux - Reinstate Monica May 08 '21 at 12:20

Zoso · Answer 2 · 2021-05-08T16:17:28.783

1

How about something like

std::int64_t my_round2(std::int64_t n) {
    int sign = n >= 0 ? 1 : -1;
    n = sign > 0 ? n: -n;
    int64_t n1 = (n + 8) >>4;
    n1<<= 4;
    return n1 * sign;
}

This still has the overflow problem though.

edited May 08 '21 at 16:17

answered May 08 '21 at 04:43

Zoso

3,273
1
16
27

@chux-ReinstateMonica Strange is an apt description of my choice. I can't recollect why I used `size_t` out of the blue. Have changed it to `int64_t`. – Zoso May 08 '21 at 16:18
Aside: With `int64_t n1`, `n1<<= 4;` can result in _undefined behavior_: With `uint64_t n1`, code does not have UB, but various _implementation defined behavior_. – chux - Reinstate Monica May 08 '21 at 17:14

score 1 · Answer 3 · answered May 08 '21 at 05:06

1

As long as your integer representation is twos-complement (which is pretty much required by other things in the C11 standard), just add 8 and mask off the lower bits:

int64_t my_round2(int64_t n) {
    return (n + 8) & ~UINT64_C(15);
}

answered May 08 '21 at 05:06

Chris Dodd

119,907
13
134
226

2

If n is -8, the result is 0, but it seems OP wants -1 in that case. – nielsen May 08 '21 at 09:18

fast rounding fixed point number

3 Answers3