Is there a way to create 128-bit float/fixed-point emulation with only doubles in C?

Question

I'm looking for a set of functions that can add, subtract, multiply, and divide like 128-bit floats. I would prefer something fast, even if it does involve some abnormal precision loss.

I realize that there is a __float128 float function in C++, and there might be a 128-bit method in C, but I can sadly, only use doubles in my program. (Shortened explanation: I'm using WebAssembly, which allows you to use C-like code online, and I am using a WASM-to-C "compiler." There's only 32-bit and 64-bit floats and ints for that language.)

Please also provide a method to convert it to float, and a simple English or JavaScript explanation of how to initialize it. No other methods are needed!

I've seen a similar question in a another language with only 32-bit floats: WebGL highp to 64-bit question

If anyone can create 192-bit or 256-bit floats/fixed-points, that would be useful as well.

There are techniques like [quad-double](https://www.davidhbailey.com/dhbpapers/quad-double.pdf) (essentially splitting a larger significant across multiple doubles) is that the sort of thing you want? It needs FMA to be effective though (can be worked around, but it gets ugly) — harold, Feb 03 '23 at 20:10
Sadly, WebAssembly does not have FMA, but yes, other than that, that is exactly what I would like. (They are considering adding it to the spec, but it won't work for now.) — Infigon, Feb 03 '23 at 20:23
What's the story with the small mantissa's though? Quad-double is more for larger mantissas, it doesn't increase the exponent range. Depending on what you're doing with these numbers, there are some other techniques that may apply, such as [LNS](https://en.wikipedia.org/wiki/Logarithmic_number_system) — harold, Feb 03 '23 at 21:04
I'm just saying that the mantissa can be basically anything, because that may or may not have a performance impact. It's just that I'm new to high precision. — Infigon, Feb 03 '23 at 21:05
Perhaps you meant the exponent can be basically anything, not the mantissa? — Ian Abbott, Feb 03 '23 at 21:09
Oh, sorry. I guess I had the name wrong (I got them confused). Thanks for the fix! — Infigon, Feb 03 '23 at 21:11
If we can ignore the exponent completely, then suddenly we're dealing with fixed-point numbers which is a lot simpler (it's just re-scaled integer arithmetic) — harold, Feb 03 '23 at 21:24
Perhaps it would help to clarify the exponent question if you told us about the magnitudes of the numbers you need to support, and about what you expect to happen if a computation produces a result of greater magnitude or an especially small magnitude. — John Bollinger, Feb 03 '23 at 21:31
What about an arbitrary-precision decimal representation such as [`decimal.js`](https://github.com/MikeMcl/decimal.js/)? Perhaps with a convenience wrapper such as [`math.bignumber`](https://mathjs.org/docs/datatypes/bignumbers.html)? — John Bollinger, Feb 03 '23 at 21:38
@JohnBollinger WebAssembly is going to be quite a bit faster than JS, and I'm only going to be using them occasionally in my scenario, instead relying mostly on double. I know of these methods, but I would need something more low-level. — Infigon, Feb 04 '23 at 01:39
@harold Fixed-point works for my case, added it to the edit! — Infigon, Feb 04 '23 at 14:48
@Infigon in that case, can you specify what you would like the scale to be? — harold, Feb 04 '23 at 15:17
@harold The scale should be up to 100 or more for my use case. — Infigon, Feb 04 '23 at 15:32

Is there a way to create 128-bit float/fixed-point emulation with only doubles in C?

0 Answers0