How to compute 2⁶⁴/n in C?

Question

How to compute the integer division, 2⁶⁴/n? Assuming:

unsigned long is 64-bit
We use a 64-bit CPU
1 < n < 2⁶⁴

If we do 18446744073709551616ul / n, we get warning: integer constant is too large for its type at compile time. This is because we cannot express 2⁶⁴ in a 64-bit CPU. Another way is the following:

#define IS_POWER_OF_TWO(x) ((x & (x - 1)) == 0)

unsigned long q = 18446744073709551615ul / n;
if (IS_POWER_OF_TWO(n))
    return q + 1;
else
    return q;

Is there any faster (CPU cycle) or cleaner (coding) implementation?

128 bit would certainly work. Alternatively, you can try using double division... — mvp, Apr 08 '19 at 02:52
if your compiler has `__int128` then the clean way is `__int128 q = ((__int128)18446744073709551615ull + 1)/n` — phuclv, Apr 08 '19 at 02:57
@mvp AFAIK, `double` is 64-bit, so we need 128-bit quadruple. — Wu Yongzheng, Apr 08 '19 at 03:39
@phuclv I tried to disassemble your code. It turns out to be calling `__divti3`, which is a bit overkill for the job. — Wu Yongzheng, Apr 08 '19 at 06:13
yes, I mean clean in terms of code. I didn't say that it's efficient in terms of CPU cycle — phuclv, Apr 08 '19 at 06:41
Seems like it would be nicer to write `18446744073709551615ul` as `-1ul`. — Nate Eldredge, Apr 09 '19 at 03:54
@NateEldredge I didn't know we have negative unsigned constants. Yes. That's definitely more readable. — Wu Yongzheng, Apr 12 '19 at 06:09

score 14 · Accepted Answer · edited Sep 14 '20 at 19:38

14

I'll use uint64_t here (which needs the <stdint.h> include) so as not to require your assumption about the size of unsigned long.

phuclv's idea of using -n is clever, but can be made much simpler. As unsigned 64-bit integers, we have -n = 2⁶⁴-n, then (-n)/n = 2⁶⁴/n - 1, and we can simply add back the 1.

uint64_t divide_two_to_the_64(uint64_t n) {
  return (-n)/n + 1;
}

The generated code is just what you would expect (gcc 8.3 on x86-64 via godbolt):

    mov     rax, rdi
    xor     edx, edx
    neg     rax
    div     rdi
    add     rax, 1
    ret

edited Sep 14 '20 at 19:38

Jean-François Fabre

137,073
23
153
219

answered Apr 09 '19 at 03:52

Nate Eldredge

48,811
6
54
82

1

Quite an astute observation. It seems to hold with an `if (twoPow64divphuclv (v) != twoPow64divnate (v)) { /* throw error */ }` for as many numbers as I was willing to wait through `:)` Though I did make yours `uint64_t twoPow64divnate (uint64_t n)` – David C. Rankin Apr 09 '19 at 04:14
@DavidC.Rankin if you can do 1 billion iterations a second, it'll take you ~585 years to loop through all possible `int64_t` values, so unfortunately you have to test a 32-bit version and then check if it work for some 64-bit values – phuclv Apr 09 '19 at 09:55
nice but wouldn't it be better to use standard `uint64_t` instead of `unsigned long` ? unsigned long isn't guaranteed to be 64 bits, even on a 64 bits compiler – Jean-François Fabre Sep 14 '20 at 19:27
1

@Jean-FrançoisFabre: Sure, if you like. The question explicitly stated their assumption that `unsigned long` is 64 bits on the system of interest, so I was just staying consistent with that. But I've changed it now. – Nate Eldredge Sep 14 '20 at 19:29

phuclv · Answer 2 · 2020-09-10T07:53:04.223

I've come up with another solution which was inspired by this question. From there we know that

(a₁ + a₂ + a₃ + ... + a_n)/n =

(a₁/n + a₂/n + a₃/n + ... + a_n/n) + (a₁ % n + a₂ % n + a₃ % n + ... + a_n % n)/n

By choosing a₁ = a₂ = a₃ = ... = a_n-1 = 1 and a_n = 2⁶⁴ - n we'll have

(a₁ + a₂ + a₃ + ... + a_n)/n = (1 + 1 + 1 + ... + (2⁶⁴ - n))/n = 2⁶⁴/n

= [(n - 1)*1/n + (2⁶⁴ - n)/n] + [(n - 1)*0 + (2⁶⁴ - n) % n]/n

= (2⁶⁴ - n)/n + ((2⁶⁴ - n) % n)/n

2⁶⁴ - n is the 2's complement of n, which is -n, or we can also write it as ~0 - n + 1. So the final solution would be

uint64_t twoPow64div(uint64_t n)
{
    return (-n)/n + (n + (-n) % n)/n + (n > 1ULL << 63);
}

The last part is to correct the result, because we deal with unsigned integers instead of signed ones like in the other question. Checked both 32 and 64-bit versions on my PC and the result matches with your solution

On MSVC however there's an intrinsic for 128-bit division, so you can use like this

uint64_t remainder;
return _udiv128(1, 0, n, &remainder);

which results in the cleanest output

    mov     edx, 1
    xor     eax, eax
    div     rcx
    ret     0

Here's the demo

On most x86 compilers (one notable exception is MSVC) long double also has 64 bits of precision, so you can use either of these

(uint64_t)(powl(2, 64)/n)
(uint64_t)(((long double)~0ULL)/n)
(uint64_t)(18446744073709551616.0L/n)

although probably the performance would be worse. This can also be applied to any implementations where long double has more than 63 bits of significand, like PowerPC with its double-double implementation

There's a related question about calculating ((UINT_MAX + 1)/x)*x - 1: Integer arithmetic: Add 1 to UINT_MAX and divide by n without overflow with also clever solutions. Based on that we have

2⁶⁴/n = (2⁶⁴ - n + n)/n = (2⁶⁴ - n)/n + 1 = (-n)/n + 1

which is essentially just another way to get Nate Eldredge's answer

Here's some demo for other compilers on godbolt

See also:

Since `(-n) /n` equals (2^64-n)/n = 2^64/n - 1, can't we just say `(-n)/n + 1`? It seems to give the right answer in all of my tests. Am I missing something? — Nate Eldredge, Apr 09 '19 at 03:37
@NateEldredge good point. I didn't think of that but it seems reasonable — phuclv, Apr 09 '19 at 09:46

Brendan · Answer 3 · 2019-04-09T04:14:40.973

We use a 64-bit CPU

Which 64-bit CPU?

In general, if you multiply a number with N bits by another number that has M bits, the result will have up to N+M bits. For integer division it's similar - if a number with N bits is divided by a number with M bits the result will have N-M+1 bits.

Because multiplication is naturally "widening" (the result has more digits than either of the source numbers) and integer division is naturally "narrowing" (the result has less digits); some CPUs support "widening multiplication" and "narrowing division".

In other words, some 64-bit CPUs support dividing a 128-bit number by a 64-bit number to get a 64-bit result. For example, on 80x86 it's a single DIV instruction.

Unfortunately, C doesn't support "widening multiplication" or "narrowing division". It only supports "result is same size as source operands".

Ironically (for unsigned 64-bit divisors on 64-bit 80x86) there is no other choice and the compiler must use the DIV instruction that will divide a 128-bit number by a 64-bit number. This means that the C language forces you to use a 64-bit numerator, then the code generated by the compiler extends your 64 bit numerator to 128 bits and divides it by a 64 bit number to get a 64 bit result; and then you write extra code to work around the fact that the language prevented you from using a 128-bit numerator to begin with.

Hopefully you can see how this situation might be considered "less than ideal".

What I'd want is a way to trick the compiler into supporting "narrowing division". For example, maybe by abusing casts and hoping that the optimiser is smart enough, like this:

  __uint128_t numerator = (__uint128_t)1 << 64;
  if(n > 1) {
      return (uint64_t)(numerator/n);
  }

I tested this for the latest versions of GCC, CLANG and ICC (using https://godbolt.org/ ) and found that (for 64-bit 80x86) none of the compilers are smart enough to realise that a single DIV instruction is all that is needed (they all generated code that does a call __udivti3, which is an expensive function to get a 128 bit result). The compilers will only use DIV when the (128-bit) numerator is 64 bits (and it will be preceded by an XOR RDX,RDX to set the highest half of the 128-bit numerator to zeros).

In other words, it's likely that the only way to get ideal code (the DIV instruction by itself on 64-bit 80x86) is to resort to inline assembly.

For example, the best code you'll get without inline assembly (from Nate Eldredge's answer) will be:

    mov     rax, rdi
    xor     edx, edx
    neg     rax
    div     rdi
    add     rax, 1
    ret

...and the best code that's possible is:

    mov     edx, 1
    xor     rax, rax
    div     rdi
    ret

A minor disadvantage (or maybe advantage?) of the "best possible" code is that if you should accidentally try it with `n=1`, you will get a divide error exception. The other suggestions all return 0. — Nate Eldredge, Apr 09 '19 at 04:24
@NateEldredge: Yes - that's why I needed to put the `if(n > 1)` when testing in godbolt (to make sure it's possible for compiler to know `n` can't be 1). — Brendan, Apr 09 '19 at 04:34
Current compilers are also not smart enough to convert a division of a `__(u)int128_t` by a constant into a multiplication and calls the division function instead. And instead of `xor rax, rax` you should use `xor eax, eax` — phuclv, Apr 09 '19 at 09:50

Matt Timmermans · Answer 4 · 2019-04-08T03:00:30.113

1

Your way is pretty good. It might be better to write it like this:

return 18446744073709551615ul / n + ((n&(n-1)) ? 0:1);

The hope is to make sure the compiler notices that it can do a conditional move instead of a branch.

Compile and disassemble.

edited Apr 08 '19 at 03:00

answered Apr 08 '19 at 02:50

Matt Timmermans

53,709
3
46
87

Interestingly, gcc 7 and 8 both compile your code with a conditional set, whose result is added. For the original code, gcc 7 does a conditional move. gcc 8 does a very clever `cmp rdi, 1 ; adc rax, 0` (!). – Nate Eldredge Apr 09 '19 at 04:33
Wow, that is very clever. I would be happy with any of these results, but your `-n/n+1` is much better, of course -- I'm disappointed that I didn't think of it. – Matt Timmermans Apr 09 '19 at 04:52

How to compute 2⁶⁴/n in C?

4 Answers4

Linked

Related