Why are float numbers rounded up in Cassandra?

Question

I created Cassandra table with column type: DataType.FLOAT.

Execute my SQL using CqlSession:

CqlSessionBuilder builder = CqlSession.builder();
        builder.addContactPoint(new InetSocketAddress(properties.getHost(), properties.getPort()));
        builder.withLocalDatacenter(properties.getDatacenter());
        builder.withAuthCredentials(properties.getUsername(), properties.getPassword());
        builder.build();

But when I insert float numbers, it's rounded up:

12334.9999 -> 12335.0.
0.999999   -> 0.999999
12345.9999 -> 12346.0

It seems like Cassandra rounds the float and consider the number of all digits, not only after the point.

What are the options to solve this problem? I know that I can use Decimal datatype, but may be you have other solution?

Aaron · Answer 1 · 2022-03-15T13:48:19.860

I actually covered this issue with Apache Cassandra and DataStax Astra DB in an article I wrote last month:

The Guerilla Guide to Building E-commerce Product Services with DataStax Astra DB

So the problem here, is that FLOAT is a fixed floating point precision type. This means that when the numeric values are converted from base-10 (decimal) to base-2 (binary), each one of the 32 binary precision points must have a value (zero or one, obviously). It's during this conversion process between base-2 and base-10 that rounding errors occur. The likelihood of a rounding error increases as the value does (on either side of the decimal point).

What are the options to solve this problem? I know that I can use Decimal datatype, but may be you have other solution?

Well, you mentioned the best solution (IMO), which to use a DECIMAL to store the value. This works, because DECIMAL is an arbitrary floating point type. The values in a DECIMAL type are stored in base-10, so there's no conversion necessary and only the required precision is used.

Before arbitrary precision types came along, we used to use INTEGERs for things that had to be accurate. The first E-commerce team I worked on stored product prices in the DB as pennies, to prevent the rounding issue.

Yes, both INT and FLOAT are fixed precision types, but an INT stores whole numbers, and all of its precision points can be used for that. Therefore the usage patterns of the bits are quite different. While both INT and FLOAT allocate a bit for the "sign" (+/-), with floating point numbers the remaining 31 precision points are pre-allocated for the full numeric value and its exponent.

So your example of 12334.9999 is essentially stored in Cassandra like this:

123349999 x 10^-4

And of course, that's stored in binary, which I won't include here for brevity.

tl;dr;

Basically FLOATs use fixed precision to store values as a formula (significand and exponent) in base-2, and the conversion back to base-10 makes rounding errors likely.

You're right, use a DECIMAL type. When you need to be exact, that's the only real solution.

If you're interested, here are two additional SO answers which provide more detail on this topic:

This is a super-awesome answer! – Erick Ramirez Mar 16 '22 at 04:13 — Erick Ramirez, Mar 16 '22 at 04:13

Why are float numbers rounded up in Cassandra?

1 Answers1