0

I am looking to take the log base n (10 would be fine) of a 256 bit unsigned integer as a floating point in rust, with no loss of precision. It would seem to me that I need to implement an 8xf64 512 bit float 512 type and use a Taylor series to approximate ln and then the log. I know there are assembly methods to obtain the log of an f64. I am wondering if anyone on stack overflow can think of a divide and conquer or other method which would be more efficient. I would be amenable to inline assembly operating on the 8xf64 512 bit array.

  • In order to decide if a different algorithm would be more efficient we'd first need to understand the algorithm you're proposing. I don't think there is enough detail here to make the comparison, and that includes how you're doing arithmetic with your 512-bit floating type, and maybe why you're implementing such a type. – President James K. Polk Jun 06 '21 at 21:03
  • 1
    The reason for the 512 bit float is to obtain the log of a 256 unsigned integer without loss of precision. http://web.mit.edu/tabbott/Public/quaddouble-debian/qd-2.3.4-old/docs/qd.pdf - I was thinking of an "octo double", using a Taylor series or newton's method to approximate ln similar to: https://barandis.github.io/qd/src/qd/quad/trans.rs.html#152-190 - the reason for the large integer is because I am working on a computer with a 256 bit word size, doing external computation about the state of that computer on an x86. – Bryan Apperson Jun 06 '21 at 22:19
  • 1
    Are you implementing some subset of 'extended' IEEE-754 semantics for this float type? i.e., do you need correct rounding modes, `log` function ULP error bounds? I would consider the [mpfr library](https://www.mpfr.org/) with [rust bindings](https://crates.io/crates/rug) - at least for prototyping or testing a custom approach. 'Recommendation' questions tend to get closed - though some are very technical in nature, and it's difficult to get advice elsewhere. Maybe a little more detail on exact requirements would be good here. – Brett Hale Jun 07 '21 at 01:28
  • I hadn't gotten that far on design yet, really the requirement is that I could take the log of a 256 bit integer without losing any precision from the least to the most significant bit, and ideally do it without arbitrary precision, the faster at runtime, the better. – Bryan Apperson Jun 07 '21 at 01:49

1 Answers1

1

This might be a useful starting point / outline of an algorithm. IDK if it will get you exact results, like error <= 0.5ulp (i.e. the last bit of the mantissa of your 512-bit float correctly rounded), or even error <= 1 ulp. Perhaps worth looking into what extended-precision calculators like bc / dc / calc do.

I think log converges quickly, so if you're going to do Newton iterations to refine, this bit-scan method might be a fast way to get a good starting point. Even if you only really need about 256 mantissa bits correct, I don't know how big a polynomial it would take to get that, and each multiply / add / fma would be on 512-bit (8x) or 320-bit (5x double precision).


Start by converting integer to binary float

For normal-sized floating-point numbers, the usual method takes advantage of the logarithmic nature of binary floating point. Without 256-bit HW float, you'll want to find the ilog2(int) yourself, i.e. position of the highest set bit (Efficiently find least significant set bit in a large array?).

Then treat your 256-bit integer as the mantissa of a number in the [1..2) or [0.5 .. 1) range, and yes use a polynomial approximation for log2() that's accurate over that limited range. (Before actual soft-float stuff, you might want to left-shift the number so it's normalized, i.e. the highest set bit is at the top. i.e. x <<= clz(x).


Then a polynomial approximation over the mantissa

And then add the integer exponent + log_approx(mantissa) => log2(x).

Efficient implementation of log2(__m256d) in AVX2 has more detail on implementing log2(double) (with SIMD doing 4 at a time, very different from doing one extended precision calculation).

It includes some links to implementations, e.g. Agner Fog's VCL using the ratio of two polynomials instead of one larger polynomial, and various tricks to maintain as much precision as possible: https://github.com/vectorclass/version2/blob/9874e4bfc7a0919fda16596144d393da5f8bf6c0/vectormath_exp.h#L942. Such as further range reduction: if x > SQRT2*0.5, then increment the exponent and double the mantissa. (If 512-bit FP division is really expensive, you might just use more terms in one polynomial.) VCL is currently Apache licensed, so feel free to copy as much as you want from it into anything.

IDK if there are more tricks that might become more valuable for big extended precision, or for soft-float, which that implementation doesn't use. VCL's math functions spend more effort to maintain high precision than some faster approximations, but they're not exact.


Do you really need 512-bit float? Maybe only 320-bit (5x double)?

If you don't need more exponent-range than a double, you might be able to extend the technique to wider floats, taking advantage of hardware FP to get 52 or 53 mantissa bits per 64-bit chunk. (From comments, apparently you're already planning to do that.)

You might not need 512-bit float to have sufficient precision. 256/52 = 4.92, so only 5x double chunks have more precision (mantissa bits) than your input, and could exactly represent any 256-bit integer. (IEEE double does have a large enough exponent range; -1022 .. +1023). And have enough to spare that log2(int) should map each 256-bit input to a unique monotonic output, even with some rounding error.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • By the log2(int) do you mean find the ilog2(int) and then bit shift clz/nlz=n left, then start treating the integer as a mantissa to the n? – Bryan Apperson Jun 06 '21 at 22:24
  • @BryanApperson: Yes, `ilog2(int)`, the integer part of the log2, because that's cheap and easy, unlike the exact `log2(int)` which is the whole function you're trying to compute. I'll fix my answer. Specifically you want a shift-count that lets you normalize your integer to be the mantissa of a float, breaking it up into an integer exponent and a fractional part, just like floating point does. – Peter Cordes Jun 06 '21 at 22:26