3

I have a program which stores a bunch of structs instances containing many members of type double. Every so often i dump these to a file, which I was doing using string builder e.g. :

 StringBuilder builder = new StringBuilder(256);

 builder.AppendFormat("{0};{1};{2};", x.A.ToString(), x.B.ToString(), x.C.ToString()); 

where 'x' is an instance of my type, and A,B,C are members of X of type double. I call ToString() on each of these to avoid boxing. However, these calls to ToString still allocate lots of memory in the context of my application, and i'd like to reduce this. What i'm thinking is to have a character array, and write each member direct into that and then create one string from that character array and dump that to the file. Few questions:

1) Does what i'm thinking of doing sound reasonable? Is there anything anyone knows of that would already achieve something similar?

2) Is there already something built in to convert double to character array (which i guess would be up to some paramaterised precision?). Ideally want to pass in my array and some index and have it start writing to there.

The reason i'm trying to do this is to reduce big spikes in memory when my app is running as I run many instances and often find myself limited by memory.

Cheers A

user555265
  • 493
  • 2
  • 7
  • 18
  • `StringBuilder` is overkill for simple concats like this. – asawyer Jun 06 '12 at 12:22
  • 2
    Why don't you simply use: `builder.AppendFormat("{0};{1};{2};", x.A, x.B, x.C)` instead? – Tim Schmelter Jun 06 '12 at 12:22
  • @asawyer actually there are many more members, this is just for example – user555265 Jun 06 '12 at 12:25
  • If you do as Tim Schmelter suggest you __don't__ introduce boxing. On the other side, calling `ToString()` on the `double` will allocate a new object on the heap effectively "boxing" the double into a string which is exactly what you want to avoid. – Martin Liversage Jun 06 '12 at 12:25
  • @MartinLiversage - wouldn't what Tim suggested first box the double, and then call ToString on that boxed object? – user555265 Jun 06 '12 at 12:27
  • @MartinLiversage Yes it would box the double before calling ToString on it. Oh, and BTW, if you do use strings I suggest you use the CultureInfo.InvariantCulture. But then again, see my answer below. – Kris Vandermotten Jun 06 '12 at 12:33
  • Boxing a double is probably going to be less expensive than creating a temporary string. – Joe White Jun 06 '12 at 12:43
  • @JoeWhite wouldn't it box and then call ToString anyway? .. so still creates a temporary string.. – user555265 Jun 06 '12 at 12:44
  • 1
    @user555265 calling ToString() on an int should box it as it most probably overrides object.ToString()'s implementation. See http://stackoverflow.com/questions/3499651/boxing-a-thing-of-the-past – Slugart Jun 06 '12 at 13:19
  • @Slugart makes sense that should be optimised to avoid boxing .. i guess it is something that was added later on in the .NET history because i see many articles like: http://www.andyfrench.info/2010/07/reminder-about-boxing-and-unboxing-in-c.html which suggest boxing is going on. . Cheers! – user555265 Jun 06 '12 at 13:24
  • @KrisVandermotten: Calling `Double.ToString()` overrides the base class method and does not cause boxing. Also see: http://stackoverflow.com/questions/436363/does-calling-a-method-on-a-value-type-result-in-boxing-in-net – Martin Liversage Jun 06 '12 at 13:32
  • @user555265 in that article the reason boxing occurs is because the parameter to AppendFormat is Object/Object[]. Whereas if you call int.ToString() before passing it to AppendFormat you pass a string which is already on the heap. – Slugart Jun 06 '12 at 13:35
  • 1
    @MartinLiversage I know, but it's not Double.ToString() that causes the boxing in builder.AppendFormat("{0};{1};{2};", x.A, x.B, x.C), it's StringBuilder.AppendFormat(string, object[]) (http://msdn.microsoft.com/en-us/library/cazfhf32). Also, don't forget we're passing an array here. That array is allocated on the heap too, in addition to the boxed doubles. – Kris Vandermotten Jun 06 '12 at 13:37

5 Answers5

3

Is the file required to be some kind of text format?

If not, by far the most efficient way to do this is using a BinaryWriter (and BinaryReader to read them back).

See http://msdn.microsoft.com/en-us/library/system.io.binarywriter.aspx for more information.

Kris Vandermotten
  • 10,111
  • 38
  • 49
2

If writing to the text file directly is possible, the steamwriter can be used to write strongly typed structs. I haven't tested memory usage, but I reckon they should be efficient

        using (var tw = new System.IO.StreamWriter("filename", true)) //(true to append to file)
        {
            tw.Write(x.A);
            tw.Write(';');
        }

If a stringbuilder is required, strongly typed overloads can also be called by using:

        builder.Append(x.A) //strongly typed as long as the field is a system type
            .Append(';')
            .Append(x.B)
            .Append(';'); 

Of course both methods would look better implementing some sort of loop or delegates, but that's beside the boxing logic.

EDIT custom double writing posted in other answer: c# double to character array or alternative

Community
  • 1
  • 1
Me.Name
  • 12,259
  • 3
  • 31
  • 48
  • the tw.Write(x.A) would underlying call ToString() i believe: http://msdn.microsoft.com/en-us/library/ek5h49e6.aspx – user555265 Jun 06 '12 at 13:03
  • Right you are, and so do the stringbuilder overloads. I didn't look that far assuming that those strongly typed would have custom behavior for improved performance. I've fiddled around a bit and added another answer with some custom tryout. – Me.Name Jun 06 '12 at 20:04
1

You should write directly to the file stream to reduce memory utilization.

using(var writer = new StreamWriter(...))
{
   writer.Write(x.A);
   writer.Write(";");
   writer.Write(x.B);
   writer.Write(";");
   writer.Write(x.C);
}
Viacheslav Smityukh
  • 5,652
  • 4
  • 24
  • 42
  • but this would just call ToString as well though i think: http://msdn.microsoft.com/en-us/library/ek5h49e6.aspx – user555265 Jun 06 '12 at 13:02
  • 1
    ToString() will be called anyway, but this way avoids additional arrays and strings which has introdused by string builder. – Viacheslav Smityukh Jun 06 '12 at 13:05
  • yep that's probably a fair point. In my particular case it really is (from using CLRProfiler to profile allocations) the ToString() operations which are allocating lots of memory so those are what i really want to optimise – user555265 Jun 06 '12 at 13:10
  • Yep .. i basically write out a CSV which gets opened in excel later – user555265 Jun 06 '12 at 13:17
  • 1
    I have investigated it, there is no way to write double content whiout string allocation. You can write your own code to convert double to char array, but I sure this is overhead! – Viacheslav Smityukh Jun 06 '12 at 13:18
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/12199/discussion-between-viacheslav-smityukh-and-user555265) – Viacheslav Smityukh Jun 06 '12 at 13:19
1

Are you sure that it is the presumable many calls to Double.ToString that is causing your memory problems? Each string should be collected on next generation 0 collection and the .NET garbage collector is pretty efficient in doing this.

If the strings you create exceed 85K they will be created on the large object heap and this may increase the total memory required by your application even though the large strings only exists transiently (large object heap fragmentation).

You can use Performance Monitor to learn more about how your application uses the managed heap. You have used CLRProfiler which is an even more advanced tool so maybe you wont learn anything new.

StringBuilder is the right class for building strings in memory but if you only build the strings in memory to later write it to a file you should instead write directly to the file using a StreamWriter.

StringBuilder will have to extend the buffer used to store the string and you can avoid this extra overhead by setting the capacity of the StringBuilder in advance (you already do this in your sample code).

No matter what overload you call to format a Double into a StringBuilder the call will eventually result in Double.ToString being called. StringBuilder.AppendFormat formats directly into the buffer without allocating an extra formatted string so in terms of memory usage StringBuilder.AppendFormat is just as fine as StringBuilder.Append and both overloads will allocate a string with the formatted Double as part of the formatting process. However, StringBuilder.AppendFormat will box the Double because it accepts an params Object[] array. Using the StringBuilder.Append overload that accepts a Double does not suffer from this problem.

If you with certainty knows that Double.ToString is the source of your memory problems I believe that you best option is to write your own floating point formatting code that can write a floating point number directly to a StringBuilder. The task is non-trivial but you could get inspiration from an open source C library.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
  • Yep .. in CLR Profiler i can see Double::ToString String() as the culprit .. i think I will follow your suggestion and will post my code once I come up with something offering the desired performance.. thanks! – user555265 Jun 06 '12 at 14:25
1

Out of sheer curiosity on how to go about, I couldn't resist trying to create a scenario that would write doubles directly. Below is the result. I haven't benchmarked it or anything, but it did work as expected in the (limited) tests I've run.

        double[] test = { 8.99999999, -4, 34.567, -234.2354, 2.34, 500.8 };
        using (var sw = new FileStream(@"c:\temp\test.txt", FileMode.Create))
        {
            using (var bw = new BinaryWriter(sw))
            {
                const byte semicol = 59, minus = 45, dec = 46, b0 = 48;

                Action<double> write = d =>
                {
                    if (d == 0)
                        bw.Write(b0);
                    else
                    {
                        if (d < 0)
                        {
                            bw.Write(minus);
                            d = -d;
                        }

                        double m = Math.Pow(10d, Math.Truncate(Math.Log10(d)));
                        while(true)
                        {
                            var r = ((decimal)(d / m) % 10); //decimal because of floating point errors
                            if (r == 0) break;
                            if (m == 0.1)
                                bw.Write(dec); //decimal point
                            bw.Write((byte)(48 + r));         
                            m /= 10d;
                        }
                    }

                    bw.Write(semicol);
                };

                foreach (var d in test)
                    write(d);
            }
        }
Me.Name
  • 12,259
  • 3
  • 31
  • 48
  • I think the (r==0) in the while loop would cause you to exit e.g. if you were writing out 8.909 .. on the middle 0 before you wrote the last 9 – user555265 Jun 07 '12 at 06:26
  • Oops, that's because of the parsing to (int) (added that while testing the encountered floating point problems, before the decimal casting), the modulo should always return at least a fraction if there are still decimals to be handled. I've edited the post and removed the (int) casting from var r = (int)((decimal)(d / m) % 10); – Me.Name Jun 07 '12 at 06:37