8

I'm running into a baffling issue with a basic MySQL query.

This is my table:

id | rating
1  | 1317.17
2  | 1280.59
3  | 995.12
4  | 973.88

Now, I'm attempting to find all rows where the rating column is larger than a certain value. If I try the following query:

SELECT * FROM (`users`) WHERE `rating` > '995.12'

It correctly returns 2.

But, if I try

SELECT * FROM (`users`) WHERE `rating` > '973.88'

it returns 4! So it's as if it thinks the 973.88 in the table is greater than 973.88, but it doesn't make the same mistake with 995.12. This happens regardless of whether I run the query from a PHP script or in phpMyAdmin.

Any ideas?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
sveti petar
  • 3,637
  • 13
  • 67
  • 144

3 Answers3

24

Decisions and consequences

This is the consequences that you've got because you decided to use floating-point data type. Floats are not precise. And that means: yes, you can result in a>a = true

For instance, your fourth row:

mysql> SELECT *  FROM t WHERE id=4;
+------+--------+
| id   | rating |
+------+--------+
|    4 | 973.88 |
+------+--------+
1 row in set (0.00 sec)

I've left data type as you've posted, it's FLOAT. Here we are:

mysql> SELECT rating>973.88 FROM t WHERE id=4;
+---------------+
| rating>973.88 |
+---------------+
|             1 |
+---------------+
1 row in set (0.00 sec)

Oops!


Behind the screen

Why? To understand why it is so, you should realize how floating-point data type is represented. Long story is here. But - I'll take a brief overview.

Here how it is represented: enter image description here where:

  • s is the sign
  • b is the base. It's meaning is same as radix
  • e is the exponent.

That means we can represent one number in different ways - and that depends of which base we'll chose. Most common is b=2. But not all real numbers can be represented exactly with this base, even if in decimal base they look "good". Famous example is 0.1 - which can not be represented in b=2 precisely - so it is stored approximately. Again, long story you can see here - but I'll just note, that it's impossible to represent it precisely with base 2.

The result is: even if number is precise in decimal radix, it still may be impossible to represent it precisely - and, therefore, it will be stored approximately. That's how it works and, in fact, this is intended - because of structure of floats itself.


What to do

Fixed precision

Well, first, you should ask yourself: do you really need float? Attention: I said: float. Because - there are also fixed point numbers. They will represent number with fixed precision. To say it easy: with fixed-point data type you may be sure that you'll store exactly what you see on the screen. So if it's 973.88 - then it's 973.88 and not 973.8800000439234. Moving to the deal:

mysql> ALTER TABLE t CHANGE rating rating DECIMAL(8,2);
Query OK, 4 rows affected, 4 warnings (0.47 sec)
Records: 4  Duplicates: 0  Warnings: 4

and ..

mysql> SELECT rating>973.88 FROM t WHERE id=4;
+---------------+
| rating>973.88 |
+---------------+
|             0 |
+---------------+
1 row in set (0.00 sec)

TADA! Magic happens. Your number is now stored with fixed precision, thus, such comparison failed.

Using float

Then, may be there are use-cases when you're stuck with floats (however, in case of DBMS it's hard for me to remember even one such use-case - if only not the case with large amount of calculations, which may cause performance impact, see description below). Then there's still a way to make it work. You should decide what precision is applicable for you. That is: from which point will you treat numbers as equals.

You're storing only two significant digits, so I assume that precision of 1E-5 would be more than enough. Then, your query will look like:

mysql> set @eps=1E-5;
Query OK, 0 rows affected (0.00 sec)

and use it with:

SELECT * FROM t WHERE rating>973.88+@eps

which will result in

+------+---------+
| id   | rating  |
+------+---------+
|    1 | 1317.17 |
|    2 | 1280.59 |
|    3 |  995.12 |
+------+---------+

Which is better?

To realize this, you'll need to look under cover once again. I've given a brief overview of what float data type is and why it isn't precise. However, fixed data type also has it's weakness. May be it's not the thing of which we should worry in context of DBMS, but I'll mention it: fixed data type, in general, will cause performance impact. And it will depend of how much calculations will you do in your DBMS.

In MySQL, fixed-point data types (such as DECIMAL) are implemented as BCD strings (so to make long story short - again, here's wiki link). That means in comparison to float it will cause performance issues. But if you're not going to do calculations in DBMS too often, then that impact won't be even noticeable - I've mentioned it because both types, float- and fixed-point have their own issues.


Conclusion

DBMS, like all the other computer stuff, isn't perfect. It's just using some internal things to do the work. That means: in some cases you'll have to realize how that internal things work to understand why did you got some odd result.

In particular, floats are not precise. Yes, there are tons of answers like this in the Internet, but I'll repeat. They are not precise. You should not rely on precision when it's about floats. And - in almost all DBMS there are fixed-point data types. And - in cases like yours you should use them. They will do just same work, but with them you'll be sure about selected precision.

However, you may want to use floats - if you're going to make too much calculations in your DBMS. But, on the other hand, that is about - why are you going to do that? Why do not use application to produce those calculations (and, therefore, avoid both performance impact of using fixed-point data-types and presicion problems with floats - because using fixed-point with average amount of calculations is ok)

Community
  • 1
  • 1
Alma Do
  • 37,009
  • 9
  • 76
  • 105
  • Thank you for the in-depth answer. I've switched the data type and the problem is gone. This makes me wonder where else I've made the same mistake of relying on `FLOAT` data type, since I've never used `DECIMAL` before, not realizing the inaccuracy of floats. – sveti petar Apr 28 '14 at 13:51
  • 1
    As I've posted, using floating-point data type has little sense in context of DBMS. In most cases in-built precision of fixed data type will be more than enough – Alma Do Apr 28 '14 at 13:56
  • -1 Suggesting that one should always use fixed point types unless there is an overriding reason to use floats is *generally speaking* **very bad advice**. Most measurements are approximations and only precise to a certain number of significant figures, which is *exactly* what a float represents (a `FLOAT` has over 7 decimal sigfigs of precision whilst a `DOUBLE` has almost 16). As you point out, just because a value appears to be "round" in base-10 is nothing special: it won't be round in many other bases (and our choice to use base-10 is pretty arbitrary). – eggyal May 14 '14 at 23:03
  • Moreover, strictly speaking a fixed-point type is *no more accurate* than a floating-point type: in MySQL's case it can simply offer more significant figures of precision (at the cost of a massive reduction in representable range, reduced storage efficiency, and worse manipulation performance). Besides which, most application code will be using floats and so values will be converted (with accompanying precision issues) in any event. – eggyal May 14 '14 at 23:08
  • When one remembers that MySQL's fixed point type has its own issues, e.g. [overflow handling](http://sqlfiddle.com/#!2/4a38f/1), it is readily apparent that one is better off understanding the pros and cons of *both* datatypes and selecting that which is most suited to the type of data being stored. In 99% of cases, that'll be a float: but yes, that does mean one has to handle them appropriately. Doing anything else is tantamount to storing a date in a string simply because it's "too difficult" to use the `DATE` type properly. – eggyal May 14 '14 at 23:10
  • First, I didn't say anything about - that fixed-point can hold values _with higher precision_. It's just about storing values with _predictable_ precision, on which you can rely. Your fiddle is showing another thing: you've chosen `numeric` there without pointing to decimal sizes (`numeric` is just `decimal` in mysql) - I strongly recommend to avoid such definitions - because - yes, it will cause unexpected behavior on edge cases. This doesn't mean that `decimal(x,y)` won't cause it - but with implicit definition it will be clear that type has it's ranges and that one should take care of. – Alma Do May 15 '14 at 06:13
  • Next,application has literally _nothing to do_ with the issue. Because in this(and all similar cases)-we're doing our calculation __through DMBS__, so only DBMS-related stuff matters.If that was about application somehow,then the question would state it(and it doesn't state it).More,no matter what type is chosen for application data storing,if we'll lose precision in DBMS calculation,there will be no chance to restore it in application.As I've said,if float is inevitable,it should be used-but in 99% cases that's just not so __TL;DR: what you're saying is not true or not related to the issue__ – Alma Do May 15 '14 at 06:18
  • By recommending that one should prefer fixed-point over floating-point (and then closing as dupes of this question others that are about floating-point issues during data retrieval), you are apparently providing advice that applies well beyond simply conducting comparison operations within the RDBMS. It is within *that* context that I raised the issue of data conversion in the application: so I'm afraid it *is* very much a relevant point. – eggyal May 15 '14 at 17:07
  • If I was wrong - then only in one single case with one single cv to this question. _May be_ in context of _that_ question this was not a good suggestion. But, paying attention to circumstances, given in context of _this_ question, my answer is complete, well-formed and well-explained. And, by the way, I've made a remark, that _may be_ floating point will be needed (so even if _I can not remember an arbitrary use-case, it doesn't mean that there's no such use case_ - which I've pointed explicitly) – Alma Do May 15 '14 at 17:11
  • You're missing my overall point. FLOATS are *the correct type* to use in 99% of use cases. This answer encourages people to use *incorrect types*, which is bad. – eggyal May 15 '14 at 17:14
  • So you're missing my overall point. FLOATS _are useless_ in 99% of use cases if say about DBMS. Fixed-point data type is in 99% cases - what is intended. Because DB should _store_ data and that means all unpredictable or edge cases should be minimal. Since fixed-point data types have no _significant_ storage-engine impact, no issues with precision e t. c. - they should be used to fulfill this goal. I just can not understand how that could be unclear. If you disagree with the answer - so be it. But that can not change facts. – Alma Do May 15 '14 at 17:37
  • I cannot help but think you simply don't understand what floating point numbers are, or how they work. They offer *more* precision than fixed-point numbers (and do so more consistently) in any situation where the values might occupy a range of magnitudes. Once one understands them, they are neither less predictable nor have any more edge cases than fixed-point types. As for storage/performance impact of fixed point types, see [this answer](http://stackoverflow.com/a/20887107). Beyond that, we shall just have to agree to disagree. – eggyal May 15 '14 at 22:32
  • Well, performance impact is debatable and for each DBMS is a unique case. For now I'm thinking just about that (because I'm in doubt as well as the answer is formed without mentioning specific case). But as for precision - I know that floats have more natural "precision" - but in case of DBMS in 99% you just don't need that. So it's not a benefit. It's just a thing that in 99% cases won't be used while dealing with approximate storage of "naturally good numbers" for floats _is a problem_. I'll think about mentioning performance impact more, but it's, again, debatable. – Alma Do May 16 '14 at 06:40
  • I've revised [my answer](http://stackoverflow.com/a/12977067) to that other question a little more. Hopefully it helps to explain my argument a little better than the short confines of these comment boxes? – eggyal May 16 '14 at 21:55
2

@Hituptony is correct. Try this:

SELECT * FROM (`users`) WHERE `rating` > 973.88

SQL FIDDLE

Ilesh Patel
  • 2,053
  • 16
  • 27
1

You do not need the single quotes when you're comparing numbers.

Take the single quotes off and try again.

However, this has been noted as something you can solve with ROUND(SUM(column), 2) * 1

if the single quotes don't work, compare it to the value of this ^^

See link: https://dev.mysql.com/doc/refman/5.0/en/problems-with-float.html

Hituptony
  • 2,740
  • 3
  • 22
  • 44