You can build a 2N-bit multiplier from multiple N-bit multipliers.
public static ulong mul64hi(ulong x, ulong y)
{
ulong accum = ((ulong)(uint)x) * ((ulong)(uint)y);
accum >>= 32;
ulong term1 = (x >> 32) * ((ulong)(uint)y);
ulong term2 = (y >> 32) * ((ulong)(uint)x);
accum += (uint)term1;
accum += (uint)term2;
accum >>= 32;
accum += (term1 >> 32) + (term2 >> 32);
accum += (x >> 32) * (y >> 32);
return accum;
}
It's just elementary-school long multiplication, really.
With signed numbers, it's a bit harder, because if intermediate results carry into the sign bit everything goes wrong. A long
can't hold the result of a 32-bit by 32-bit multiply without that happening, so we have to do it in smaller chunks:
public static long mul64hi(long x, long y)
{
const long thirtybitmask = 0x3FFFFFFF;
const long fourbitmask = 0x0F;
long accum = (x & thirtybitmask) * (y & thirtybitmask);
accum >>= 30;
accum += ((x >> 30) & thirtybitmask) * (y & thirtybitmask);
accum += ((y >> 30) & thirtybitmask) * (x & thirtybitmask);
accum >>= 30;
accum += ((x >> 30) & thirtybitmask) * ((y >> 30) & thirtybitmask);
accum += (x >> 60) * (y & fourbitmask);
accum += (y >> 60) * (x & fourbitmask);
accum >>= 4;
accum += (x >> 60) * (y >> 4);
accum += (y >> 60) * (x >> 4);
return accum;
}
Inspired by harold's comment about Hacker's Delight, the signed version can be made just as efficient as the other, by carefully controlling whether intermediate results are or are not signed:
public static long mul64hi(long x, long y)
{
ulong u = ((ulong)(uint)x) * ((ulong)(uint)y);
long s = u >> 32;
s += (x >> 32) * ((long)(uint)y);
s += (y >> 32) * ((long)(uint)x);
s >>= 32;
s += (x >> 32) * (y >> 32);
return s;
}