Why does "dtoa.c" contain so much code?

Question

I'll be the first to admit that my overall knowledge of low level programming is a bit sparse. I understand many of the core concepts but I do not use them on a regular basis. That being said I was absolutely astounded at how much code was needed for dtoa.c.

For the past couple months I have been working on an ECMAScript implementation in C# and I've been slowing filling in the holes in my engine. Last night I started working on Number.prototype.toString which is described in section 15.7.4.2 of the ECMAScript specification (pdf). In section 9.8.1, NOTE 3 offers a link to dtoa.c but I was looking for a challenge so I waited to view it. The following is what I came up with.

private IDynamic ToString(Engine engine, Args args)
{
    var thisBinding = engine.Context.ThisBinding;
    if (!(thisBinding is NumberObject) && !(thisBinding is NumberPrimitive))
    {
        throw RuntimeError.TypeError("The current 'this' must be a number or a number object.");
    }

    var num = thisBinding.ToNumberPrimitive();

    if (double.IsNaN(num))
    {
        return new StringPrimitive("NaN");
    }
    else if (double.IsPositiveInfinity(num))
    {
        return new StringPrimitive("Infinity");
    }
    else if (double.IsNegativeInfinity(num))
    {
        return new StringPrimitive("-Infinity");
    }

    var radix = !args[0].IsUndefined ? args[0].ToNumberPrimitive().Value : 10D;

    if (radix < 2D || radix > 36D)
    {
        throw RuntimeError.RangeError("The parameter [radix] must be between 2 and 36.");
    }
    else if (radix == 10D)
    {
        return num.ToStringPrimitive();
    }

    var sb = new StringBuilder();
    var isNegative = false;

    if (num < 0D)
    {
        isNegative = true;
        num = -num;
    }

    var integralPart = Math.Truncate(num);
    var decimalPart = (double)((decimal)num.Value - (decimal)integralPart);
    var radixChars = RadixMap.GetArray((int)radix);

    if (integralPart == 0D)
    {
        sb.Append('0');
    }
    else
    {
        var integralTemp = integralPart;
        while (integralTemp > 0)
        {
            sb.Append(radixChars[(int)(integralTemp % radix)]);
            integralTemp = Math.Truncate(integralTemp / radix);
        }
    }

    var count = sb.Length - 1;
    for (int i = 0; i < count; i++)
    {
        var k = count - i;
        var swap = sb[i];
        sb[i] = sb[k];
        sb[k] = swap;
    }

    if (isNegative)
    {
        sb.Insert(0, '-');
    }

    if (decimalPart == 0D)
    {
        return new StringPrimitive(sb.ToString());
    }

    var runningValue = 0D;
    var decimalIndex = 1D;
    var decimalTemp = decimalPart;

    sb.Append('.');
    while (decimalIndex < 100 && decimalPart - runningValue > 1.0e-50)
    {
        var result = decimalTemp * radix;
        var integralResult = Math.Truncate(result);
        runningValue += integralResult / Math.Pow(radix, decimalIndex++);
        decimalTemp = result - integralResult;
        sb.Append(radixChars[(int)integralResult]);
    }

    return new StringPrimitive(sb.ToString());
}

Can anyone with more experience in low level programming explain why dtoa.c has roughly 40 times as much code? I just cannot imagine C# being that much more productive.

Why can't you imagine it? That kind of differential is exactly why languages like C# are so popular. — , Jul 03 '10 at 22:39
@Neil - I guess it isn't that hard to imagine. I should have realized how much work goes into making native code cross-platform. — ChaosPandion, Jul 03 '10 at 22:45
@Mark - Well I'll be damned, there was a bug with negative numbers. Nice catch. — ChaosPandion, Jul 04 '10 at 16:19
@sth - It does now, thanks a bunch. You guys are good I must say. — ChaosPandion, Jul 04 '10 at 16:49
I also suspect that `(1.3333333333333333).toString(3)` doesn't give `1.1` with your implementation... — sth, Jul 04 '10 at 17:19

Rick Regan · Accepted Answer · 2010-11-12T18:37:41.173

46

dtoa.c contains two main functions: dtoa(), which converts a double to string, and strtod(), which converts a string to a double. It also contains a lot of support functions, most of which are for its own implementation of arbitrary-precision arithmetic. dtoa.c's claim to fame is getting these conversions right, and that can only be done, in general, with arbitrary-precision arithmetic. It also has code to round conversions correctly in four different rounding modes.

Your code only tries to implement the equivalent of dtoa(), and since it uses floating-point to do its conversions, will not always get them right. (Update: see my article http://www.exploringbinary.com/quick-and-dirty-floating-point-to-decimal-conversion/ for details.)

(I've written a lot about this on my blog, http://www.exploringbinary.com/ . Six of my last seven articles have been about strtod() conversions alone. Read through them to see how complicated it is to do correctly rounded conversions.)

edited Nov 12 '10 at 18:37

answered Jul 04 '10 at 01:33

Rick Regan

3,407
22
28

Did you try just plain old toString() (base 10)? It's interesting -- to my knowledge, Javascript uses dtoa.c, but dtoa.c only prints decimal strings. I wonder what is used to print to non-decimal bases? As for the amount of code, I'd say it's the "extreme accuracy" requirement (but Mark is more qualified to answer, having worked with dtoa.c extensively.) – Rick Regan Jul 04 '10 at 16:57
@Rick - It looks like I didn't need to perform that extensive test to find it. @sth found that `(1.3333333333333333).toString(3)` is an example where my code fails. – ChaosPandion Jul 04 '10 at 17:24
Two things: 1) dtoa.c only handles base 10, so we're no longer comparing your code to dtoa.c (unless someone in Javascript has modified it to do so). 2) 1.3333333333333333 decimal is conceptually closest to 1.1 base 3, but don't forget it is going through binary floating-point first and then to base 3; I think it's allowed not to be 1.1 in this case, as long as what it prints represents the value in the floating-point variable faithfully. – Rick Regan Jul 04 '10 at 17:53
@Rick - You know what? My output matches the output of IE. I can't help but wonder what exactly these two browsers are doing. It is a bit comforting that my output matches a widely used implementation though. – ChaosPandion Jul 04 '10 at 18:56
Could you give some examples of matching output, for conversions to decimal strings? I'd like to understand better exactly what you're comparing. (Also, are you running random tests, or just hand picked examples? I don't think you can say the output is the same unless you run millions of examples). – Rick Regan Jul 04 '10 at 19:04
It's rather unexpected that your simple program's double to decimal conversions match IE and Firefox exactly. If that's true, then there'd be no need for dtoa.c. I'm just suggesting that maybe there's an error in your test methodology. Could you at least give me one example of a double to decimal conversion that matches, just so I can see what you are comparing? (And you're welcome for the blog link!) – Rick Regan Jul 04 '10 at 20:10
1

@Chaos: The basis of your question is basically; "My code does everything that dtoa does, so why is dtoa so large?" That is incorrect, your code does not at all do what dtoa does, so don't be surprised when people call you on it. – Ed S. Aug 18 '10 at 00:40
4

Wow, WAY too much conversation here. You guys should have done this in chat. – unixman83 Feb 13 '12 at 10:16

Matthew Slattery · Answer 2 · 2012-06-27T20:38:59.683

Producing good results for conversions between decimal and binary floating point representations is a rather difficult problem.

The major source of difficulty is that many decimal fractions, even simple ones, cannot be accurately expressed using binary floating point -- for example, 0.5 can (obviously), but 0.1 cannot. And, going the other way (from binary to decimal), you generally don't want the absolutely accurate result (for example, the accurate decimal value of the closest number to 0.1 which can be represented in an IEEE-754-compliant double is actually 0.1000000000000000055511151231257827021181583404541015625) so you normally want some rounding.

So, conversion often involves approximation. Good conversion routines guarantee to produce the closest possible approximation within particular (word size or number of digits) constraints. This is where most of the complexity comes from.

Take a look at the paper cited in comment at the top of the dtoa.c implementation, Clinger's How to Read Floating Point Numbers Accurately, for a flavour of the problem; and perhaps David M. Gay (the author)'s paper, Correctly Rounded Binary-Decimal and Decimal-Binary Conversions.

(Also, more generally: What Every Computer Scientist Should Know About Floating Point Arithmetic.)

I am well aware of the challenges that floating-point represents. My question has more to do with the difference in the amount of code. When I was testing my example against Firefox my code produced identical results. *Firefox does use dtoa.c*. — ChaosPandion, Jul 04 '10 at 00:03

score 4 · Answer 3 · answered Jul 03 '10 at 22:41

4

Based on a quick glance at it, a fair amount of the C version is dealing with multiple platforms and such as it looks like this file is meant to be generically usable across compilers (C & C++), bitnesses, floating point implementations, and platforms; with tons of #define configurability.

answered Jul 03 '10 at 22:41

dkackman

15,179
13
69
123

1

I Agree. I also had a short look at the code, I felt sick of the work it represents for such a simple operation. With .NET I see these times fortunately behind us for application programming. – jdehaan Jul 03 '10 at 22:45
1

Wouldn't it just be easier to have an archive of c files with each supporting a separate platform? Reading all those `#ifdef` blocks gave me a brain aneurysm... – ChaosPandion Jul 03 '10 at 22:49
Yeah, the problem isn't the languages, it's just an awkward design overall. The same effect could be achieved with better modularity. – Cogwheel Jul 03 '10 at 23:15
3

@ChaosPandion: depends whether you want your platform-specific configuration to be done in a C header file (setting the defines), or in a makefile (selecting the right .c file to build). Also, if there are enough defines to give you an aneurysm, that could be 2^aneurysm distinct c files if all the possibilities were multiplied out. In practice, when writing this kind of ultra-portable stuff you usually want a small number of source files (in this case 1, but sometimes a few more) driven by a large number of configuration options, rather than the other way around. – Steve Jessop Jul 04 '10 at 01:01
... and the reason you might want to write the code to support platforms based on their behaviour / properties, rather than writing exactly one dtoa implementation for each platform, is so that you can port to new platforms just by writing a header file with the right combination of defines, rather than re-writing all of stdlib every time. A massively-configurable implementation like this isn't always the best, but it may not be as bad as it looks when you consider the alternative. – Steve Jessop Jul 04 '10 at 01:03
@Steve - You know I think it just may have been the initial shock of seeing so much code. Looking at it in more detail it actual comes together a bit better. I also found @Tyler's comment in @Lajnold's answer to be a bit enlightening about the subject. – ChaosPandion Jul 04 '10 at 01:26
@dkackman - What do you think contributes more to the amount of code, cross-platform compatibility or the extreme accuracy that dtoa provides? – ChaosPandion Jul 04 '10 at 16:30
1

@ChaosPandion hard for me to judge without digging into it (rough guess about 50/50). One way to separate the wheat form the chaff would be to look at the preprocessed output for a compilation that had the #defines set the way you want to support. Using MS tools you can do this with something close to: CL /E /C dtoa.c > dtoa.pre.c. This will show you what the preprocessor is actually submitting to the compiler. – dkackman Jul 04 '10 at 16:52

score 4 · Answer 4 · answered Jul 03 '10 at 23:10

4

I think also that the code in dtoa.c might be more efficient (independent of language). For example, it seems to be doing some bit-fiddling, which in the hands of an expert often means speed. I assume it simply uses a less intuitive algorithm for speed reasons.

answered Jul 03 '10 at 23:10

Lajnold

3,089
1
18
7

2

A lot of C library functions (and C library extensions) are written like this. It makes sense, because they are generally not updated, edited, or otherwise maintained very frequently, but they are called all over the place from vast numbers of programs. It only makes sense to make them as fast as possible even at the expense of some maintainability. – Tyler McHenry Jul 03 '10 at 23:19

score 3 · Answer 5 · answered Jul 04 '10 at 19:21

3

Short answer: because dtoa.c works.

This is exactly the difference between well-debugged product and a NIH prototype.

answered Jul 04 '10 at 19:21

Pavel Radzivilovsky

18,794
5
57
67

2

"Not invented here" as in "custom ad-hok code, as an alternative to popular 3rd party code" – Pavel Radzivilovsky Jul 04 '10 at 19:52
2

I mean you are right but it explains nothing as to why there is so much code. Who knows maybe someone else wrote an implementation that correctly converted with half as much code and twice as much functionality. – ChaosPandion Jul 04 '10 at 19:57

Why does "dtoa.c" contain so much code?

5 Answers5

Linked