How to multiply double (floating point) by an integer type (32-bit, 64-bit, 128-bit, etc.) manually

Question

I am trying to manually implement multiplication between a double and a 128 bit integer that I have created myself using two ulongs.

My understanding is as follows:
1. Decompose the double into it's significand and exponent. Ensuring the significand is normalized.
2. Multiply the significand and my uint128. This will give me at 256 bit number.
3. Shift my 256 bit number by exponent extracted from the double.
4. If the value is over 128 bits, then I overflowed.

I feel like I am incredibly close, but I am missing something. Lets say I have the following example. I am storing a uint128 with the value 2^127 and I want to multiply it by 8E-6.

uint128 myValue = new uint128(2^127);
double multiplier = 8E-6;
uint128 product = myValue * multiplier;

The real value or correct answer is 1361129467683753853853498429727072.845824. So I would like to get the value 1361129467683753853853498429727072 as my 128-bit integer.

The problem is my implementation is giving me 1361129467683753792259819967610881.

int exponent; // This value ends up being -69 for 8E-6
uint128 mantissa = GetMantissa(multiplier, out exponent); // This value ends up being 4722366482869645 after normalizing it.
uint256 productTemp = myValue * mantissa; // This value is something like 803469022129495101412490705402148357126451442021826560.
uint128 product = productTemp >> exponent. // this value is 1361129467683753792259819967610881

I am using this code from extracting mantissa and exponent from double in c# to get my mantissa and exponent. I can use those values to correctly get 8E-6 back as a double.

Does anyone know what I am getting wrong here? If I using .8 instead of 8E-6 my values are better.

C# is what I am writing it in, but the language isn't too relevant as long as it has IEEE 754 floating point double precision is being used. I wrote my own UInt128, my own multiplication that handles 256 bits, and the decomposing of the double. — Chris Delpire, Jan 07 '20 at 00:39
Stepping back and looking at the big picture, I understand that you already have a data type that gets the results you want, System.Decimal, but you are hoping to get better performance. Usually, the big step is between hardware implementation and software implementation. Are you sure you are going to get enough performance gain to justify the work? — Patricia Shanahan, Jan 07 '20 at 23:43
@PatriciaShanahan I am not. The performance of System.Double is significantly faster than the performance of System.Decimal, but it is possible that System.Double will not be sufficient for the accuracy I am needing. I am looking at a specific set of operations on a specific range of values. I think in the end, I will not be any faster, but I need to have valid code to benchmark in order to confidently present that. — Chris Delpire, Jan 08 '20 at 15:12

chux - Reinstate Monica · Accepted Answer · 2020-01-07T04:22:02.590

1

what I am getting wrong here?

double multiplier does not have the arithmetic value of 0.000008. It have a dyadic value near 0.000008, to 15-17 significant decimal places. That difference accounts for not meeting your expectation.

1234567890123456
1361129467683753 853853498429727072.845824 - perceived product
1361129467683753 853853498429727072        - perceived rounded product
1361129467683753 792259819967610881        - product seen.

Try multiplier with an exact value in decimal like 0.0625 (1.0/16).

Notes:

With binary64, the closest double to 8E-6 is (@Patricia Shanahan) 0.000007999999999999999637984894607090069484911509789526462554931640625.

Multiplying that by 2¹²⁷ is exactly

1361129467683753 792259819967610880.0

So the multiplication appears to be off-by-one, perhaps rounding?

edited Jan 07 '20 at 04:22

answered Jan 06 '20 at 23:14

chux - Reinstate Monica

143,097
13
135
256

Interesting. So there were values, like 0.8, which came out as I had expected. Is there a way I can handle doubles that do not have an arithmetic value, and if so, is there a way I can tell whether a double has an arithmetic value or not? – Chris Delpire Jan 07 '20 at 00:43
1

@ChrisDelpire The only doubles that do not have an arithmetic value are the infinities and NaNs (all bits one exponent). The problem is the other way round. There are many decimal fractions that do not correspond exactly to any double, so they get approximated on decimal fraction to double conversion. – Patricia Shanahan Jan 07 '20 at 01:13
@PatriciaShanahan So does that mean that when I am storing 8E-6, it is actually storing 7.999999E-6 under the hood? How is it then, for example in C#, than when I look at the value, it is shown as 8E-6? – Chris Delpire Jan 07 '20 at 01:18
1

The closest IEEE 754 64-bit binary number to 8E-6 is 0.000007999999999999999637984894607090069484911509789526462554931640625. Most language designers feel that programmers would not want to see such things, so they shorten and round the output. – Patricia Shanahan Jan 07 '20 at 01:25
@PatriciaShanahan I agree with that! But in this case, I am trying to get as accurate as I possibly can. – Chris Delpire Jan 07 '20 at 01:57
@ChrisDelpire I don't know C#, so I can't help directly with getting exact output. In Java, I use `System.out.println(new BigDecimal(8E-6));`. BigDecimal's double constructor and its toString are both exact. – Patricia Shanahan Jan 07 '20 at 02:00
@PatriciaShanahan Well I am trying to spin up my own type for performance reasons. I actually have a solution to my problem with C#'s System.Decimal, but for my specific use case I wanted to see if I couldn't spin up my own, more performant version. – Chris Delpire Jan 07 '20 at 02:05
@Chris Delpire To be clear, it appears your multiplication has little trouble, it is simply that `multiplier` does not take on the _exact value of `8E-6`. So the "what I am getting wrong here" is thinking the multiplication is wrong when it is assuming `multiplier` is exactly `8E-6` is the significant oops. – chux - Reinstate Monica Jan 07 '20 at 04:04
@chux-ReinstateMonica yes, that seems clear now, especially when you number the digits like that. Since I'll only have 15-17 digits of significance in my double, I can only have 15-17 digits of significance in my uint128. Certain numbers cannot be stored exactly. It is unfortunate because I was hoping to be able to go back and forth using 2^127 and 2^-127. Thank you for your help. – Chris Delpire Jan 07 '20 at 04:14
@ChrisDelpire `double` as a 64-bit object using [binary64](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) can encode _exactly_ about 2^64 different values. `8E-6` is not one of them. `2^127` and `2^-127` _are_ two of them. Your multiplier does need a little work (as pointed out in the answer), but don't discredit the good work you have so far. Look for that pesky off-by-one. – chux - Reinstate Monica Jan 07 '20 at 04:20
@ChrisDelpire Tip: Print out FP values in hexadecimal notation rather than decimal to get a binary view of what values you truly have. – chux - Reinstate Monica Jan 07 '20 at 04:28
@chux-ReinstateMonica thank you for the words of encouragement. I actually have been spending a long time looking at my values as binary strings to ensure I was separating the exponent and significand parts correctly, and then correctly normalizing the decimal value. To get the accuracy I am looking for, I need my multiplication result to be 1361129467683753 853853498429727072, but looking at the bit string, it is way different than 1361129467683753 792259819967610881. I'll take a look at the values in hex rather than binary today to see if that brings me any inspiration. – Chris Delpire Jan 07 '20 at 14:19
1

@ChrisDelpire To get 136...072, you will need to multiply `2^127` by something other than a `double` - and I doubt `long double` will suffice too. So the problem becomes what type/structure to use to represent `8E-6`. – chux - Reinstate Monica Jan 07 '20 at 15:07

How to multiply double (floating point) by an integer type (32-bit, 64-bit, 128-bit, etc.) manually

1 Answers1