1

I encountered a floating-point imprecision issue in Awk that I can't solve. Is there a simple solution to it?

Here is my example Awk script to replicate the floating-point imprecision issue.

BEGIN {
  print "PREC = " PREC
  print "OFMT = " OFMT
  print "CONVFMT = " CONVFMT
  a = 1.2 + 3.4
  b = 8.9 - 4.3
  print "a = " a
  print "b = " b
  if ( a == b )
    print "a == b"
  else
    print "a != b"
  c = 3.2 + 5.4
  d = 9.8 - 1.2
  print "c = " c
  print "d = " d
  if ( c == d )
    print "c == d"
  else
    print "c != d"
}

Here is the output of the above script.

PREC = 53
OFMT = %.6g
CONVFMT = %.6g
a = 4.6
b = 4.6
a != b
c = 8.6
d = 8.6
c == d

Why is a != b even if both have same values? Yet, c == d works properly.

I assume Awk has some internal floating-point imprecision. FYI, I'm using Gawk 4.1.4.

I tried various values for PREC, OFMT & CONVFMT, but failed to find ones that would work.

E.g. Changed OFMT & CONVFMT to %.6f:

PREC = 53
OFMT = %.6f
CONVFMT = %.6f
a = 4.600000
b = 4.600000
a != b
c = 8.600000
d = 8.600000
c == d

E.g. Changed PREC to 16:

PREC = 16
OFMT = %.6g
CONVFMT = %.6g
a = 4.6
b = 4.6
a != b
c = 8.6
d = 8.6
c == d

Basically, I'm hoping for some settings inside BEGIN, instead of changing every expression where floating-point arithmetic & comparison are, since my actual Awk script is much longer than the example above.

E.g. I rather not having to use sprintf for each arithmetic & comparison expression, or to convert each input number to integer after scaling by 1e6 & convert each output number by 1e-6. Such approach would be very daunting.

FYI, floating-point numbers in input files will have maximum 6 decimal points, but they may be without decimal points, i.e. they range from 0 to 6 decimal points.

Thank you for your help.

HN

HCN
  • 31
  • 3
  • "have maximum 6 decimal points" --> can you scale each input by 1,000,000 and round, then perform your math? – chux - Reinstate Monica Dec 05 '20 at 21:35
  • @chux I was hoping it won't come to that, as previously stated. Besides the final output, program will also print intermediate. So, I'll have to include the conversion at each print. – HCN Dec 05 '20 at 22:22
  • Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – President James K. Polk Dec 06 '20 at 03:02
  • @PresidentJamesK.Polk Not exactly. I kind of knew that the issue was floating-point related albeit not exactly how. I'm looking less for academic explanation, but more for a simple & elegant solution in Awk without having to explicit handling the imprecision at every arithmetic expression. I was hoping scripting languages like Awk would free users from having to deal such tedious & messy issue that exists in compiled languages like C++, including types, declarations, array bound, unsigned vs signed number, max integer number, etc, at small price of run-time overhead. – HCN Dec 06 '20 at 03:58

3 Answers3

1

Here the higher precision is working against you. Since some of the decimal values cannot be represented exactly in binary you're just pushing the limits of number equivalence to higher precision numbers which will not be satisfied.

For example for 53 digit precision, you get

1.2 => 1.199999999999999955591079014993738383054733
3.4 => 3.399999999999999911182158029987476766109467
8.9 => 8.900000000000000355271367880050092935562134
4.3 => 4.299999999999999822364316059974953532218933

a = 4.599999999999999644728632119949907064437866
b = 4.600000000000000532907051820075139403343201
a != b

3.2 => 3.200000000000000177635683940025046467781067
5.4 => 5.400000000000000355271367880050092935562134
9.8 => 9.800000000000000710542735760100185871124268
1.2 => 1.199999999999999955591079014993738383054733
c = 8.600000000000001421085471520200371742248535
d = 8.600000000000001421085471520200371742248535
c==d

My suggestion is set the PREC to a more reasonable value (based on your input data precision). I think 10 would be a good tradeoff with minimal code change.

'BEGIN{PREC=10; ...

NB. If you ask why c,d matches, notice that they are all fractions are multiples of 0.2, whereas a,b has a 0.3.

karakfa
  • 66,216
  • 7
  • 41
  • 56
0

Floating point numbers aren't exact, the answers displayed are rounded off and aren't exactly what the floating point representation is, but the test for equality counts every bit of the results.

As an example, try dividing 1 by 3 with pencil and paper, you get 0.3333333... until you run out of paper. Now multiplying should give you 1.0, right? No, you'll get 0.9999999999...

Similarly, floating point can't exactly represent 0.1.

What's generally done is to compare equality to be within a certain limit, called an "epsilon".

if absolute value of (a - b) < 0.0000001
   then print "Equal"

https://www.youtube.com/watch?v=PZRI1IfStY0

Arthur Kalliokoski
  • 1,627
  • 12
  • 12
  • All finite floating point numbers are exact - just like all integers. It is the math that many find inexact. – chux - Reinstate Monica Dec 05 '20 at 21:32
  • 1
    Please [do not recommend](https://stackoverflow.com/questions/13940316/floating-point-comparison-revisited) that people [compare values with a tolerance](https://stackoverflow.com/questions/17333/what-is-the-most-effective-way-for-float-and-double-comparison). Even the name, comparing with an “epsilon,” is a misnomer. – Eric Postpischil Dec 05 '20 at 21:35
  • Re “floating point can't exactly represent 0.1”: 1•10^−1 is a floating-point representation of .1, exactly. – Eric Postpischil Dec 05 '20 at 21:39
  • @EricPostpischil Given that the precision of the inputs is known, as is the precision of the outputs, wouldn't comparison to a tolerance in this case be a valid approach? – beaker Dec 05 '20 at 21:45
  • 1
    @beaker: Is the OP only doing subtraction as they have shown, or is that just a sample constructed for the purpose of posting to Stack Overflow? If they have more complicated arithmetic in their actual application, we do not know what the error bounds are. Plus they have told us a limit on the precision of the input numerals (maximum six digits after the decimal point) but not on their range (how many digits before the decimal point). – Eric Postpischil Dec 05 '20 at 22:05
  • So, handling the imprecision at each arithmetic & comparison expression in Awk, which I've dreaded, is the only solution? No simpler solution? So, it sounds like Awk doesn't offer any advantage when floating-point arithmetic & comparison are involved. Am I on point or way off? – HCN Dec 05 '20 at 22:07
  • 1
    @user14771043: There is no general solution for making computer arithmetic behave like real-number arithmetic. There are entire books, classes, and papers on the subject. There are simple solutions for simple cases. You have not specified your application well enough for a solution to be recommended. Why are you subtracting values read from input? Why are you comparing them? Is that the only arithmetic you perform? How big can the numbers be? – Eric Postpischil Dec 05 '20 at 22:09
  • @EricPostpischil The program involves basic arithmetic of addition, subtraction & multiplication. The maximum value of numbers is expected to be smaller than 1e9. Is there a simpler solution than handling tolerance at each expression in Awk? – HCN Dec 05 '20 at 22:14
  • 1
    @HCN: Arbitrarily large errors can be created with those operations. Quite simply, consider that you already know 9.8−1.2−8.6 does not give zero. Repeatedly multiply that non-zero error by something greater than one, and it will grow forever. I do not know what you are doing in this file. Maybe you are subtracting and carrying over the difference to the next line and multiplying that by something (as in compound interest). So multiplying that non-zero error by something in each line for a thousand lines can grow the error to any size. A better description of the application is needed. – Eric Postpischil Dec 05 '20 at 23:38
  • @EricPostpischil The program doesn't do any cumulative or exponential calculations. Basically, it reads in bunch of files with similar format but varying contents, extracts numbers, calculates thems & tabulates outputs in CSV format. Examples of calculations are sum of 3 numbers, difference between 2 numbers, average of 5 numbers, multiplication number by certain scale (ranging from 0.1 to 100), etc. However, some calculations conditionally depend on results of other calculations, hence why the comparison between 2 floating-point numbers shown in example above. – HCN Dec 06 '20 at 01:41
  • @EricPostpischil BTW, I know I can solve the floating-point imprecision issue for my purpose using sprintf( "%.6f", ) before assigning each expression to variable, but there will be lots of sprintf everywhere. I'm hoping Awk has some easy setting inside BEGIN which will implicitly perform sprintf for all arithmetic expressions afterwards. Unfortunately, OFMT & CONVFMT appear applying only to values displayed, not expression evaluation. – HCN Dec 06 '20 at 01:46
  • @HCN: If the maximum magnitude of the numbers is M, the rounding error occurred by converting them to IEEE-754 binary64 (which gawk probably uses, but that should be checked) with correct rounding-to-nearest is not more than M•2^−53. If you average five such numbers, there could be cumulative errors of five times that plus four more errors from the sums, and another error in the division by 5. Then, if you multiply by 100, that multiplies the error by about 100 (possibly very slightly more, due to yet another rounding required in the multiplication)… – Eric Postpischil Dec 06 '20 at 22:15
  • @HCN: So, information like that can be used to calculate bounds on the errors and tell whether “comparing with a tolerance” will always produce correct results. But you have still not told us M, in spite of being asked. To get an answer to a problem, **you must specify the problem**. That includes being specific about bounds and about sequences of operations performed, not general descriptions of “calculates them” and “tabulates outputs.” – Eric Postpischil Dec 06 '20 at 22:17
  • @EricPostpischil I think you missed my reply yesterday which I mentioned 1e9 being the maximum value of numbers expected from input. The round-off error for average value among 5 numbers is not critical for the intended purpose, as long as they are rounded-off consistently. Decision for subsequent calculations is based on comparison of their calculations. An example of a part of the computations -- assume A, B, C, D, E, F, G, H are read from input. If average( A, B, C ) == average( D, E, F ) then X = ( A - D ) + ( B - E ) + ( C - F ) else X = ( A + B + C ) * G - ( D + E + F ) * H. – HCN Dec 07 '20 at 02:21
0

GNU Awk's User's Guide - Setting precision says

If you need to represent a floating-point constant at a higher precision than the default and cannot use a command-line assignment to PREC, you should either specify the constant as a string, or as a rational number, whenever possible.

beaker
  • 16,331
  • 3
  • 32
  • 49
Daweo
  • 31,313
  • 3
  • 12
  • 25