5

How to efficiently compute 2¹²⁸ % n, where n is uint64_t (and non-zero)?

If I had access to a narrowing division like _udiv128 intrinsic of MSVC I could do:

uint64_t remainder;
_udiv128(1, 0, n, &remainder);
_udiv128(remainder, 0, n, &remainder);

But that's two slow instructions, not to mention there are CPUs like ARM which do not have the narrowing division. Is there a better way?

Related: How to compute 2⁶⁴/n in C?

Ecir Hana
  • 10,864
  • 13
  • 67
  • 117
  • unfortunately you can optimize division by a constant but not the case where dividend is a constant. If it's too slow then you're probably doing lots of divisions like that and should optimize with SIMD and multithreading. There's a [library for optimizing that called libdivide](https://libdivide.com/) – phuclv Jan 12 '23 at 13:00
  • 1
    in some cases you can utilize the reciprocal instruction in ARM and x86 though: [How to optimize for dividing constant?](https://stackoverflow.com/q/44862898/995714) – phuclv Jan 12 '23 at 13:08
  • Have you looked at the `libdivide` library? I think this section could be useful: https://github.com/ridiculousfish/libdivide/blob/3bd34388573681ce563348cdf04fe15d24770d04/libdivide.h#L467 – Joseph Wood Jan 15 '23 at 19:34
  • @JosephWood AFAIK, `libdivide` is good only when we compute `a%b` and `b` is a constant but here `a` is the constant, not `b` (at least, it is not specified and it would be unexpected since the result can be precomputed at compile-time in that case). – Jérôme Richard Jan 15 '23 at 20:22
  • 3
    @JérômeRichard: The libdivide library is intended to provide, via its API, optimizations for repeated divisions by the same divisor, but Joseph Wood did not point to API information but specifically to source code for a routine that divides a 128-bit number by a 64-bit number. I think the intent there may be that the source code contains some useful information that could be adapted to this problem. – Eric Postpischil Jan 15 '23 at 20:35
  • 1
    @EricPostpischil, that's exactly what I meant. Sorry about the confusion. – Joseph Wood Jan 16 '23 at 00:39
  • related: [How to divide 2¹²⁸ - 1 by a 64-bit divisor, where I only need the 64 LSBs of the quotient?](https://stackoverflow.com/q/70051781/995714) – phuclv Jan 19 '23 at 01:21
  • Have you benchmarked that code and seen that it is too slow for your purpose? (This is the obligatory premature optimization comment) – anton-tchekov Jul 09 '23 at 21:47

0 Answers0