1

When I write a large CSV-like file containing millions of double values, the bottleneck appears to be the conversion of double to string.

What is the fastest way to append a double value to a StreamWriter, with a fixed number of digits after the point?

Currently I use

// called once 
System.Globalization.NumberFormatInfo nfi = new System.Globalization.NumberFormatInfo();
nfi.NumberDecimalDigits = 4;

// called millions of times in a loop
streamwriter.Write(mydouble.ToString(nfi));

If i write a constant string instead of a double, the program finishes 10 times faster.
If I write an int instead of a double, it is still more than twice as fast.
(All tests were executed in release mode, without a debugger attached)

What is the fastest way to convert this double to a string?


I have included a benchmark to illustrate my problem below:

I write 1 million doubles to a file, 100 times in a row.

The total time is 25.2 seconds. The loop with only double.ToString, and no streamwriter.Write finishes in 21 seconds. The loop with only streamwriter.Write finishes in 3.5 seconds

System.Globalization.NumberFormatInfo nfi = new System.Globalization.NumberFormatInfo();
nfi.NumberDecimalDigits = 4;
double d = 0.1234;
Stopwatch watch;

watch = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
    using (StreamWriter sw = new StreamWriter(@"c:\temp\test.txt", false, Encoding.UTF8, 65536))
    {
        for (int j = 0; j < 1000000; j++)
        {
            sw.Write(d.ToString(nfi));
        }
    }
}
Console.WriteLine("stream.Write & double.ToString: {0}", watch.ElapsedMilliseconds);

watch = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
    using (StreamWriter sw = new StreamWriter(@"c:\temp\test.txt", false, Encoding.UTF8, 65536))
    {
        for (int j = 0; j < 1000000; j++)
        {
            sw.Write("0.1234");
        }
    }
}
Console.WriteLine("only stream.Write: {0}", watch.ElapsedMilliseconds);

watch = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
    using (StreamWriter sw = new StreamWriter(@"c:\temp\test.txt", false, Encoding.UTF8, 65536))
    {
        for (int j = 0; j < 1000000; j++)
        {
            string s = d.ToString(nfi);
        }
    }
}
Console.WriteLine("only double.ToString: {0}", watch.ElapsedMilliseconds);
HugoRune
  • 13,157
  • 7
  • 69
  • 144
  • Interesting: [here](http://cc.davelozinski.com/c-sharp/fastest-way-to-convert-an-int-to-string), a test was made for int to string conversion, and `ToString()` turned out to be the best. – Wiktor Stribiżew Sep 21 '16 at 11:29
  • I tried some of the other methods before: previously I had used String.Format, and double.ToString appeared to be faster. But i am still trying to find a better way. It is strange that such an operation should be the bottleneck when writing a file, instead of the disk speed. – HugoRune Sep 21 '16 at 11:33
  • The speed will be different if the application is built using DEBUG or RELEASE. Release uses the built-in co-processor to do floating point arithmetic while DEBUG simulates the co-processor. – jdweng Sep 21 '16 at 11:49
  • I forgot to mention: I did all performance tests in release mode, without a debugger attached (by pressing strg+f5). I'll update the question accordingly. – HugoRune Sep 21 '16 at 11:56
  • Converting milion of doubles is not the bottleneck, it's a question of half second. Problem is obviously writing to a stream. – Motlicek Petr Sep 21 '16 at 12:02
  • http://stackoverflow.com/questions/18757097/writing-data-into-csv-file – Motlicek Petr Sep 21 '16 at 12:12
  • @MotlicekPetr My appologies, I probably should have included a small benchmerk program, I'll try to add one later. But I tried the above example even without the streamwriter; just with `string s = mydouble.ToString(nfi);` and the time improved only slightly. Whereas if I removed the mydouble.ToString(nfi) and instead wrote a constant string to the stream, I got a 10-fold improvement. – HugoRune Sep 21 '16 at 12:12
  • `StreamWriter.Write` has an overload that takes a `double` directly. Have you tried using that? Does that make any difference? – Chris Dunaway Sep 21 '16 at 15:48
  • I hadn't tried that before, since I saw no easy way to use a custom IFormatProvider. But I just tried it, and there was no improvement. Reflection shows that `StreamWriter.Write(double)` simply calls `StreamWriter.Write(value.ToString(FormatProvider));` – HugoRune Sep 21 '16 at 15:53

2 Answers2

1

Converting a double to a string is a complicated matter and can be huge performance killer if you need to convert a lot of doubles. Your only options are either to implement a better/faster conversion function if the .NET version is too slow for you or don’t convert at all (and find another way of solving your problem).

Try Ryū's fast float-to-string conversion algorithm, which also has a double-tostring implementation. If you need to control the number of decimal points, the simplest way is to add a parameter and use min function, taking into account whether a sign is present.

Ryū is faster than Grisu, a C# version of a fast conversion algorithm introduced by Florian Loitsch. You need to apply the 4 decimals format yourself but that can be done with some simple string manipulation.

Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
Ton Plooij
  • 2,583
  • 12
  • 15
1

The general-purpose double-to-string converter has to watch out for all kinds of edge cases like NaN, super-large numbers, super-small numbers, not to mention figuring out on-the-fly how many digits to retain to the right of the decimal point.

If you know the number range, you may be able to do it yourself, by converting various pieces to integers. For example (in C):

bool bNegative = false;
if (v < 0){v = -v; bNegative = true;} // make v >= 0
double fv = floor(v); // get integer part as double
int i = (int)fv;      // get integer part as integer
int f = (int)floor((v - fv)*1000.0); // get fraction thousandths as integer
// print the integer and the fractional thousandths, both as integers
if (bNegative){
    fprintf(file, "-%d.%03d", i, f);
} else {
    fprintf(file, "%d.%03d", i, f);
}

or something along those lines...

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • It is an interesting idea, but my initial tests seem to show that writing out two integers and making the required conversions is actually slower than writing a single double, in c#. – HugoRune Sep 22 '16 at 09:14
  • @HugoRune: When I have to deal with files that big, I know no human will ever read them, so I write/read them in binary. It's faster and more accurate. SAS knew this ages ago, so they defined something called "SAS Transport Format" or "xpt". – Mike Dunlavey Sep 22 '16 at 12:29