How do I save a floating-point number in 2 bytes?

Question

Yes I'm aware of the IEEE-754 half-precision standard, and yes I'm aware of the work done in the field. Put very simply, I'm trying to save a simple floating point number (like 52.1, or 1.25) in just 2 bytes.

I've tried some implementations in Java and in C# but they ruin the input value by decoding a different number. You feed in 32.1 and after encode-decode you get 32.0985.

Is there ANY way I can store floating point numbers in just 16-bits without ruining the input value?

Thanks very much.

What range of numbers do you need to encode, and how many significant digits do they have? Consider decimal fixed- or floating-points. — CodesInChaos, May 02 '12 at 13:39
Can you store it as an `unsigned short`, with some bits being used for the exponential part? You would then manually convert back from this format to a regular single precision `float`. — Matthew, May 02 '12 at 13:40
You need to be more specific with your problem statement. Two bytes can represent at most 65536 different floating point values. Which ones are the ones you want? Worst case, you can have a table of "the 65536 floating point values I care about" and make the two-byte value a lookup. — Raymond Chen, May 02 '12 at 13:43

Guffa · Accepted Answer · 2012-05-02T16:08:27.333

11

You could store three digits in BCD and use the remaining four bits for the decimal point position:

52.1 = 521 * 10 ^ -1 => 0x1521
1.25 = 125 * 10 ^ -2 => 0x2125

This would give you a range from 0.0000000000000001 to 999. You can of course add an offset for the decimal point to get for example the range 0.0000000001 to 999000000.

Simple implementation of four bit used for decimal point placement, and the rest for the value. Without any error check, and not thoroughly checked. (May have precision issues with some values when using != to compare doubles.)

public static short Encode(double value) {
  int cnt = 0;
  while (value != Math.Floor(value)) {
    value *= 10.0;
    cnt++;
  }
  return (short)((cnt << 12) + (int)value);
}

public static double Decode(short value) {
  int cnt = value >> 12;
  double result = value & 0xfff;
  while (cnt > 0) {
    result /= 10.0;
    cnt--;
  }
  return result;
}

Example:

Console.WriteLine(Encode(52.1));
Console.WriteLine(Decode(4617));

Output:

4617
52.1

edited May 02 '12 at 16:08

answered May 02 '12 at 13:43

Guffa

687,336
108
737
1,005

@Geotarget: You could squeeze 4 digits into two bytes, but then you only have two bits left for describing where the decimal point is. For numbers with less digits you just fill with zeroes, i.e. `1.5` is the same as `001.5` or `1.500`. – Guffa May 02 '12 at 13:52
Can you show examples of float to binary encode/decode functions? I'm sorry but I don't really understand what's hapenning. – Robin Rodricks May 02 '12 at 13:55
@Geotarget: I added a simple implementation above. – Guffa May 02 '12 at 14:06

score 8 · Answer 2 · answered May 02 '12 at 13:40

8

C# has no built in functionality for that, but you could try a fixed point approach.

Example of 8,8 Fixed point (8 before comma, 8 after):

float value = 123.45;
ushort fixedIntValue = (ushort)(value * 256);

This way, the number is stored like this: XXXXXXXX,XXXXXXXX

and you can retrieve the float again using this:

float value = fixedIntValue / 256f;

answered May 02 '12 at 13:40

bytecode77

14,163
30
110
141

1

That also has a limited precision. 52.1 becomes 52.09765625. – Guffa May 02 '12 at 13:45
1

Well, you can't have everything. If you want more, you could either try 6,10 fixed point or use 4 bytes. – bytecode77 May 02 '12 at 15:44
1

The op doesn't ask for everything, just to get back exactly the same value. This is not at all unreasonable, if a limited range is acceptable. You just have to use a different approach than a binary floating/fixed point number. – Guffa May 02 '12 at 16:03

score 5 · Answer 3 · answered May 02 '12 at 13:40

Are you sure you need such a micro-optimization, over simply using a float or double?

Would you be better served by storing a short and understanding that, e.g., it's divided by 100 to make the actual number? (E.g. your examples of 52.1 and 1.25 could be stored as 5210 and 125) I think this might be the best solution for you.

If you're set on using an actual floating-point number, you can take the decoded number and round it to x significant digits, (from your example, 3) which should usually get you back the same number you started with (note that yes, that's intentionally vague - you can't guarantee getting the original unless you store the original).

I can see this being used for network communication in e.g. games. They don't need to be super precise for sending for example position data but network traffic is heavily limited and the difference between 2 bytes and 4 bytes is really noticeable when you have to service hundreds of players multiple times per second. — Keya Kersting, May 03 '20 at 18:38

Oliver Charlesworth · Answer 4 · 2012-05-02T13:44:12.480

4

The problem is that you can't precisely represent 32.1 in any binary floating-point type.

In single-precision, the closest representable value is 32.099998. In half-precision, it's apparently 32.0985.

You could consider a decimal floating-point type, but this solution is not unique to half-precision.

edited May 02 '12 at 13:44

answered May 02 '12 at 13:38

Oliver Charlesworth

267,707
33
569
680

A half-precision value uses 11 bits for the significand (the leading bit 1 is implicit). In the interval [32,64), 6 of those bits are used for the integer part, leaving 5 bits for the fractional part. So in that domain, [32,64), the representable values are exactly the mulitples of 1/(2**5) = 1/32. The closest one to `32.1` will be 32+3/32 (a.k.a. 1027/32) which is `32.09375`. So your "apparently" is not correct, after all. I do not know where the asker has his example from. For a half-precision value, you would normally output only 3 decimal digits, so `"32.1"` would be the usual precision. – Jeppe Stig Nielsen Apr 08 '16 at 10:30

score 1 · Answer 5 · edited Apr 22 '18 at 14:39

1

From your examples you want to store 3 digits and a decimal point. You could simply encode your 'alphabet' of 11 symbols into a 4-bit code, and store 4 x 4 bits in 2 bytes.

edited Apr 22 '18 at 14:39

MattDavey

8,897
3
31
54

answered May 02 '12 at 13:39

High Performance Mark

77,191
7
105
161

score 1 · Answer 6 · answered May 02 '12 at 20:49

There are 4,278,190,080 32-bit floating-point values, not including NaNs and infinities. There are 65,536 values for the 16 bits in two bytes. Clearly, it is impossible to uniquely encode all the floating-point values in two bytes.

Which ones do you want to encode?

Even for a single value of the sign and exponent (e.g., all floating-point values from 4 to 8, not including 8), there are 8,388,608 floating-point values, so you cannot even encode those in two bytes.

You have to restrict yourself to a small subset of values to encode. Once you have done that, people may have suggestions about how to encode them. What is the actual problem you are trying to solve?

How do I save a floating-point number in 2 bytes?

6 Answers6

Linked