Create a precise atof() implementation in c

Question

I have written an atof() implementation in c . I am facing rounding off errors in this implementation . So , putting in a test value of 1236.965 gives a result of 1236.964966 but the library atof() function reurns 1236.965000 . My question is , how to make the user defined atof() implementation more 'correct' ?

Can the library definition of atof() be found somewhere ?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

float str_to_float(char *);
void float_to_str(float,char *);

int main(){
    int max_size;
    float x;
    char *arr;
    printf("Enter max size of string : ");
    scanf("%d",&max_size);
    arr=malloc((max_size+1)*sizeof(char));
    scanf("%s",arr);
    x=str_to_float(arr);
    printf("%f\n%f",x,atof(arr));
    return 0;
}

float str_to_float(char *arr){
    int i,j,flag;
    float val;
    char c;
    i=0;
    j=0;
    val=0;
    flag=0;
    while ((c = *(arr+i))!='\0'){
//      if ((c<'0')||(c>'9')) return 0;
        if (c!='.'){
            val =(val*10)+(c-'0');
            if (flag == 1){
                --j;
            }
        }
        if (c=='.'){ if (flag == 1) return 0; flag=1;}
        ++i;
    }
    val = val*pow(10,j);
    return val;
}

Search for GLIBC's repository. Also it is unclear what you need to do since you haven't posted your own code. — meowgoesthedog, Sep 18 '18 at 16:55
Perhaps you are performing multiple operations on a `double` variable, which rarely stores an *exact* representation of the value, and so the error gets worse with each operation. — Weather Vane, Sep 18 '18 at 16:59
`atof` returns a `double` value but you are working with `float`. — Weather Vane, Sep 18 '18 at 17:02
What you're trying to do is actually a very hard problem, not an introductory programming exercise. — R.. GitHub STOP HELPING ICE, Sep 18 '18 at 17:05
@John: Mine in musl libc is here: https://git.musl-libc.org/cgit/musl/tree/src/internal/floatscan.c?id=v1.1.20#n66 (link is to start of the decimal case, which is the interesting part). It's **dense** code, but self-contained, no external bignum libraries or anything. — R.. GitHub STOP HELPING ICE, Sep 18 '18 at 17:13
Are you sure? I don't think 1236.965 can be exactly represented in a C float (IEEE 754 32-bit - binary32) value, a quick check shows 1236.9649658203125 is the actual (closest) value that can be represented in that format. Using this value makes sense why you see 1236.964966 (when round to 6 decimal places), Check out this link and try it for yourself: https://baseconvert.com/ieee-754-floating-point — Marker, Sep 18 '18 at 17:19
In [this answer](https://stackoverflow.com/a/51304463/298225), I provided C++ code to convert any simple decimal numeral to binary floating-point correctly with round-to-nearest-ties-to-even. It is intended to demonstrate how the mathematics may be performed with elementary-school arithmetic; it is not intended for production use. This code handles only only numerals with decimal digits and a decimal point; it does not handle scientific notation. For complete production code, a good understanding of floating-point arithmetic is required first. After that, the classic paper is… — Eric Postpischil, Sep 18 '18 at 18:29
… [*Correctly Rounded Binary-Decimal and Decimal-Binary Conversions*](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.4049) by David M. Gay. There is currently a copy [here](https://ampl.com/REFS/rounding.pdf). — Eric Postpischil, Sep 18 '18 at 18:30
@HansPassant [atof() is non-standard](https://stackoverflow.com/questions/52391330/create-a-precise-atof-implementation-in-c/52392540?noredirect=1#comment91727943_52391330) --> C11 specifies it C11 7.22.1.1. Using `strtof()` has advantages, but both are specified in the C standard library. — chux - Reinstate Monica, Sep 18 '18 at 18:46
@eric: there's not much point pasting a link to a deleted answer. You and I can see it but people below the [moderator-tools](https://stackoverflow.com/help/privileges/moderator-tools) privilege threshold (10k rep) just get a generic "that question has been deleted" page. — rici, Sep 18 '18 at 19:37
@rici: It is the best link available at the moment. I already retrieved the code and plan to extend it to support scientific notation. After that, I expect to post it as a new answer to a suitable question. — Eric Postpischil, Sep 18 '18 at 20:15
@eric: that seems fine but perhaps it would be better to not even bother with the link for now, since the person to whom the comment is directed can't follow it. If it were me, I'd find it annoyingly frustrating so it's hard to imagine that it fits into SO's "welcome newcomers" policy. Anyway, hopefully this interchange serves as some kind of explanation for the OP. — rici, Sep 18 '18 at 20:45

Thomas Padron-McCarthy · Accepted Answer · 2018-09-19T09:04:00.517

4

Change all your floats to doubles. When I tested it, that gave the same result as the library function atof for your test case.

atof returns double, not float. Remember that it actually is double and not float that is the "normal" floating-point type in C. A floating-point literal, such as 3.14, is of type double, and library functions such as sin, log and (the perhaps deceptively named) atof work with doubles.

It will still not be "precise", though. The closest you can get to 1236.965 as a float is (exactly) 1236.9649658203125, and as a double 1236.964999999999918145476840436458587646484375, which will be rounded to 1236.965000 by printf. No matter how many bits you have in a binary floating-point number, 1236.965 can't be exactly represented, similar to how 1/3 can't be exactly represented with a finite number of decimal digits: 0.3333333333333333...

And also, as seen in the discussion in comments, this is a hard problem, with many possible pitfalls if you want code that will always give the closest value.

edited Sep 19 '18 at 09:04

answered Sep 18 '18 at 17:19

Thomas Padron-McCarthy

27,232
8
51
75

3

The value OP is testing is sufficiently short that a naive implementation will probably give the right result. Your answer is helpful identifying one problem, but doesn't address that there are much deeper issues which are not really OP's fault but a matter of this being a deceptively hard problem. – R.. GitHub STOP HELPING ICE Sep 18 '18 at 17:21
1

Floating point numbers are tricky to work with. See [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html), [The Floating Point Gui.de](http://floating-point-gui.de/), [Why Are Floating Point Numbers Inaccurate](https://stackoverflow.com/questions/21895756/why-are-floating-point-numbers-inaccurate), [Floating-Point Numbers: Issues and Limitations](https://docs.python.org/2/tutorial/floatingpoint.html), and [Why Floating-Point Numbers May Lose Precision](https://msdn.microsoft.com/en-us/library/c151dt3s.aspx). – Bob Jarvis - Слава Україні Sep 18 '18 at 17:33

score 1 · Answer 2 · answered May 13 '21 at 14:57

I used your code as inspiration to write my own. What other commenters and answers do not recognize is that the original reason for the question is an embedded situation. In my case the library "atof" pulls in something that does "printf" which pulls in "systemcalls" which I don't have.

So.... here I present a simple (does not implement exponential notation) atof implementation that works in floats, and is suitable for embedding.

My implementation uses way less variables.

float ratof(char *arr)
{
  float val = 0;
  int afterdot=0;
  float scale=1;
  int neg = 0; 

  if (*arr == '-') {
    arr++;
    neg = 1;
  }
  while (*arr) {
    if (afterdot) {
      scale = scale/10;
      val = val + (*arr-'0')*scale;
    } else {
      if (*arr == '.') 
    afterdot++;
      else
    val = val * 10.0 + (*arr - '0');
    }
    arr++;
  }
  if(neg) return -val;
  else    return  val;
}

chux - Reinstate Monica · Answer 3 · 2018-09-18T19:46:42.410

0

how to make the user defined atof() implementation more 'correct' ?

Easy: 1) never overflow intermediate calculation and 2) only round once (at the end).

It is hard to do those 2 steps.

Note: C's atof(), strtof(), etc. also handle exponential notation - in decimal and hex.

Potential roundings

val*10
(val*10)+(c-'0');
pow(10,j)
val*pow(10,j)  // This last multiplication is the only tolerable one.

Potential overflow (even though the final answer is within range)

val*10
(val*10)+(c-'0');
pow(10,j)

Using a wider type like double can greatly lessen the occurrence of such problems and achieve OP's "more 'correct'". Yet they still exist.

This is not an easy problem to solved to get the best (correct) floating point result from all string inputs.

Sample approaches to solve.

Avoid overflow: rather than pow(10,j):

val = val*pow(5,j);  // rounds, `pow(5,j)` not expected to overflow a finite final result.
val = val*pow(2,j);  // Does not round except at extremes

Code should form (ival*10)+(c-'0') using extended integer math in the loop for exactness.

Yet this is just scratching the surface of the many corner cases.

@Eric Postpischil commented on a robust C++ code that handles non-exponential notation string input well. It does initial math using integers and only rounds later in the process. This linked code is not visible unless your rep is 10,000+ as the question was deleted.

edited Sep 18 '18 at 19:46

answered Sep 18 '18 at 18:23

chux - Reinstate Monica

143,097
13
135
256

1

[Actually, it is pretty easy. It can be done with techniques taught in elementary school.](https://stackoverflow.com/a/51304463/298225) Doing it fast with constant memory (given fixed precision and exponent range) is harder. – Eric Postpischil Sep 18 '18 at 18:33
@EricPostpischil Yes, this goal is easier with relaxed precision requirements. With no relaxation, a singular challenging case area for "best" result is when `char *s` is a value _nearly exactly_ half-way between `TRUE_MIN` and `TRUE_MIN`*2. That takes a lot of precision to know to round up or down - I suspect about worst case. – chux - Reinstate Monica Sep 18 '18 at 18:37
There are no relaxed precision requirements. By “fixed precision and exponent range” in the previous comment, I meant that the constant memory requirement may require preparing tables in advance, using knowledge of the precision and exponent range of the target format. The code I point to has no limits on the input size or precision and produces a **correctly rounded result** for the target format. It handles all the near-the-midpoint cases correctly. Again, this requires only elementary-school arithmetic. – Eric Postpischil Sep 18 '18 at 18:44
@EricPostpischil My prior comment was not meant to imply anything about the linked content, just the raw comment. OP does have relaxed precision requirements with only seeking "more 'correct'" and not "correct". OP appears to tolerate some level of not best (incorrect) results. Yes the math is elementary, – chux - Reinstate Monica Sep 18 '18 at 19:00
@EricPostpischil In review of your [linked answer](https://stackoverflow.com/a/51304463/298225), it does nicely well handle the non-exponential strings for `atof()`, like OP's `float_to_str()` aims to do. I'd UV it had the question not been deleted. Top bad - it is quite informative - good for CR. Yet C's more "correct" `atof()/strtod()/etc.` do handle exponential represented strings and that obliges a wider (though not unlimited) precision for power and multiply code. Something hard to do well, even for someone past learner levels. – chux - Reinstate Monica Sep 18 '18 at 19:37
@EricPostpischil `bool RoundUp = Bits[Round] && (Bits[LowBit] || Bits[Sticky]);` in `GetValue()` is unclear to me if that is correct when the final result in a sub-normal. I'd expect that bits used to determine rounding would depend on `Exponent` in that region. – chux - Reinstate Monica Sep 18 '18 at 20:05
I was thinking about exponential notation. I think it can be handled just by adjusting where the decimal point is, inserting virtual zeros if needed. So I am going to modify the code for that and post a new version in the future. I will look at the rounding for subnormals. – Eric Postpischil Sep 18 '18 at 20:17
Note that `PushBitLow` clamps at the lower exponent bound—above that, it treats leading zeros as non-signfiicant digits. Once the minimum exponent is reached, zeros become significant digits. So the clock starts ticking on how many bits there are in a significand, which I think puts the rounding and sticky bits for subnormals in the right position. I will look further. – Eric Postpischil Sep 18 '18 at 20:22

Create a precise atof() implementation in c

3 Answers3