Implementing long integer division in VC++ with larger word sizes

Question

I'm making some improvements to a C++ large-integer library primarily targeting Visual C++ (2012 and later) on x64, and I'd really like to improve the speed of my division routine by using wider words.

Right now the operation produces 16 bits of quotient per iteration, with a primitive operation looking basically like this:

uint16_t U[], V[];
uint32_t u = (uint32_t(U[i+1]) << 16) | U[i];
uint16_t v = V[j];
uint16_t q = uint16_t(u/v);

That produces the IDIV instruction with 32-bit operands (and EDX zeroed), which is fine, but slow thanks to the large number of iterations. I'd really like to use IDIV's support for 64/32 or even 128/64 division, but I cannot convince Visual C++ to let me use them. Dividing a 64-bit number by a 32-bit number results in a call to the internal 64/64 div routine, which is not particularly fast, and which is total overkill (since my code makes sure that the quotient will never overflow). And I can't even touch the 128/64 division since there's no support for 128-bit numbers.

Normally this is where intrinsics would come in, but VC++ doesn't seem to offer an intrinsic to use the high operand in division (as it does through __umulh for multiplication). Without support for inline assembly in x64, it's looking like the only solution is to reimplement the routine entirely in ASM, which I want to avoid if at all possible.

How can I use the full power of IDIV in a VC++ long division routine?

All I can offer is this: https://stackoverflow.com/questions/8453146/128-bit-division-intrinsic-in-visual-c — David Wohlferd, Oct 11 '16 at 03:00
@DavidWohlferd Hmm, ominous. Though I do like the suggestion to wrap only the primitive division in an assembly function... slightly more call overhead, but saves me from trying to beat the optimizer on the rest of the function. — Sneftel, Oct 11 '16 at 07:41
Glad it helped, if only a little. Sorry I can't offer better news. Maybe something got added since VS2012? — David Wohlferd, Oct 11 '16 at 08:21
finally MS agreed to add [128-bit division intrinsic to Visual Studio](https://stackoverflow.com/a/56033459/995714) — phuclv, May 10 '19 at 16:28

Implementing long integer division in VC++ with larger word sizes

0 Answers0