One approach for handling fractional values is by using a fixed-point representation, which can be useful in various situations. E.g., neural networks can utilize a fixed-point representation alongside weight quantization to improve prediction speed and reduce storage requirements.
In fixed-point, numbers are represented by an integer and a scaling factor. For example, 2.020
can be represented by integer 2020
and scaling factor 1/1000
. Using scaling factors that are powers of two is computationally convenient, since bit-shifts can be used for rescaling. For some operations, like addition and subtraction, the resulting scaling factor matches the scaling factor of the operands. For multiplication, the scaling factors multiply, which could be rescaled using a subsequent division (or bit-shift). For division, the scaling factors divide out, which can be accounted for with a preceding scaling of one of the operands.
For example, here's an implementation in x86-64
assembly for the example in the question, which results in 47
in rax
and 28
in rbx
. I believe that division (and right bit-shifts) used as-is can introduce a bias since there will be truncation instead of rounding. The code below does not handle this in general, but does for the case of rounding the decimal to two digits (otherwise the decimal would truncate to 27
). Adding 1/2
prior to right bit-shifts would result in rounding when rescaling. Adding (divisor - 1) / 2
to a dividend would result in rounding for division. However, the error incurred from truncating may be small relative to other sources of errors.
mov rax, 123 ; rax = 123
imul rax, rax ; rax *= rax
shl rax, 8 ; use scaling factor of 2 ^ -8 for the numerator
mov rbx, 320 ; rbx = 320
xor rdx, rdx ; set numerator high bits to 0 for division
div rbx ; rax /= rbx
movzx ebx, al ; move fractional part from al to bl
shr rax, 8 ; rescale to unit scaling factor
imul rbx, 100 ; multiply leftmost two decimals out of fractional part
add rbx, 0x80 ; if fractional part > 1/2, carry into integer
shr rbx, 8 ; drop fraction from rbx