1

The following console application utilises TStringList.SaveToFile to write multiples lines to a text file:

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Classes;
var
  i: Integer;
  a,b,c: Single;
  myString : String;
  myStringList : TStringList;
begin
  try
    Randomize;
    myStringList := TStringList.Create; 
    for i := 0 to 1000000 do
    begin
      a := Random;
      b := Random;
      c := Random;
      myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + FloatToStr(c);
      myStringList.Add(myString);
    end;
    myStringList.SaveToFile('Output.txt');
    myStringList.Free;
    WriteLn('Done');
    Sleep(10000);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

It takes around 3 seconds to write a >50MB file with 1000001 lines and seems to work fine. However, many people advocate using streams for such processes. What would the stream equivalent be and what are the advantages/disadvantages of using it compared to TStringList.SaveToFile?

bmargulies
  • 97,814
  • 39
  • 186
  • 310
Trojanian
  • 692
  • 2
  • 9
  • 25
  • 1
    Using a string list seems like a sound choice here. This will create a large buffer containing the entire contents of the string list. And then it will splat it onto a `TFileStream`. Do you have a problem? If you want to do better, you could try avoiding the string list altogether and writing directly to a stream. You'd want a buffered stream (http://stackoverflow.com/questions/5639531/buffered-files-for-faster-disk-access/5639712#5639712). But we don't know enough about your problem to say what the solution should be. – David Heffernan Jan 30 '13 at 11:04
  • No, I was just wondering if there was any difference between this and TFileStream options. If I was working with a very large file, would an alternative approach be better? Is it not dangerous to create a very large buffer containing the entire contents of a file as opposed to splitting it into sections? How would I achieve the split? – Trojanian Jan 30 '13 at 11:11
  • 1
    50MB may be fine. 1GB would not be. To split it you would use a buffered stream like the one I linked to. And then write on the stream rather than adding to the string list. – David Heffernan Jan 30 '13 at 11:18
  • @David, do you have a blog or personal web site? You've already posted unbelievable amounts of production-ready code, like that buffered `TStream` you linked. It'd be nice to see it all properly indexed. – Cosmin Prund Jan 30 '13 at 11:22
  • 1
    @CosminPrund No I don't. I don't have enough time or motivation to do something like that. That stream is nifty though huh?!! – David Heffernan Jan 30 '13 at 11:29
  • @DavidHeffernan, wouldn't want you take away time from SO for such a project. – Cosmin Prund Jan 30 '13 at 11:45
  • @CosminPrund That's a very perceptive window onto my life ;-) – David Heffernan Jan 30 '13 at 11:49
  • 1
    There are some differences between TStringList, TStringStream, TFileStream, TStreamReader/Writer in the area of how they handle different encodings. If you have a specific practical question in that area, post a new question. – Jan Doggen Jan 30 '13 at 12:00
  • FWIW, The latest Delphi has a TBufferedFileStream (or whatever it is called) too. Didn't look, but I assume it is similar to David's. – Rudy Velthuis May 09 '16 at 12:32

2 Answers2

4

It may be faster to write directly to a stream. Or it may not. I suggest you try it out and time both options. Writing to a stream looks like this:

for i := 0 to 1000000 do
begin
  a := Random;
  b := Random;
  c := Random;
  myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + 
    FloatToStr(c) + sLineBreak;
  Stream.WriteBuffer(myString[1], Length(myString)*SizeOf(myString[1]));
end;

To have any hope of this version being fast, you need to use a buffered stream. Try this one: Buffered files (for faster disk access).

The code above will output UTF-16 text on modern Delphi. If you want to output ANSI text simply declare myString as AnsiString.

I'll let you do the timing, but my guess is that this variant performs similarly to the string list. I suspect that the time is spent calling Random and FloatToStr. I expect that the file saving with the string list is already very fast.

Putting speed to one side, there is another benefit of this approach. In the string list approach, as per the code in the question, the entire content of the text file is stored in memory. And when you save the file, another copy is made as part of the save procedure. So you will have two copies of the entire file in memory.

In contrast, when saving directly to a stream, the only memory requirement is whatever buffer your stream class uses. For a 50MB file as per the question there's likely no real problem with either approach. For a much larger file then you will run into out of memory errors if you try to hold the entire file in memory.


Personally though, I'd consider making use of the TStreamWriter class. This useful class separates the concerns of writing data (text, values etc.) from the concern of pushing to a stream. Your code would become:

Writer := TStreamWriter.Create(Stream);//use whatever stream you like
try
  for i := 0 to 1000000 do
  begin
    a := Random;
    b := Random;
    c := Random;
    Writer.WriteLine(FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) +
      FloatToStr(c));
  end;
finally
  Writer.Free;
end;

The TStreamWriter implements buffering with a 1KB buffer so you can use TFileStream and expect to get reasonable performance.


I would recommend that you choose the technique that leads to the most readable code. If performance becomes an issue you can optimise that later. My personal preference would be for TStreamWriter. This gives very clean and readable code, yet also excellent separation of content generation from streaming. The performance is perfectly reasonable also.

Community
  • 1
  • 1
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Thank you for this. What does `myString[1]` return? i.e. Which version of the overloaded `WriteBuffer` procedure does your first example call? – Trojanian Jan 31 '13 at 10:53
  • 1
    `myString[1]` returns `Char` or `AnsiChar` depending on the type of `myString`. It calls the untyped parameter `WriteBuffer` overload. The one who's parameters are: `const Buffer; Count: Longint`. – David Heffernan Jan 31 '13 at 11:08
  • How do I assign the file location in the TStreamWriter method? – Trojanian Jan 31 '13 at 12:40
  • 1
    The `TStreamWriter` object has no knowledge of that. It just works with a stream. When you create the stream, create a `TFileStream` and at that point you assign the file name. – David Heffernan Jan 31 '13 at 13:27
3

A TFileStream based solution would look as follows, but there are some important points:

  • The TFileStream code is slower. There's no buffering in TFileStream and writing 20 bytes at a time to file is not effective. The TStringList bufferes everything in RAM and saves it all at once. That's optimum, but it uses a lot of RAM.
  • In the TStringList - based variant 50% of time is spent in Random, as expected actually.
  • For the TFileStream solution to become more effective you'd need to roll a buffering scheme so you'd write a reasonable amount to disk each time (example: 4Kb)

Code:

program Project9;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  SysUtils,
  Classes,
  DateUtils;
var
  i: Integer;
  a,b,c: Single;
  myString : AnsiString;
  StartTime: TDateTime;
  F: TFileStream;
begin
  try
    Randomize;
    StartTime := Now;
    F := TFileStream.Create('Output.txt', fmCreate);
    try
      for i := 0 to 1000000 do
      begin
        a := Random;
        b := Random;
        c := Random;
        myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + FloatToStr(c);
        myString := AnsiString(Format('%f'#9'%f'#9'%f'#13#10, [a, b, c]));
        F.WriteBuffer(myString[1], Length(myString));
      end;
    finally F.Free;
    end;
    WriteLn('Done. ', SecondOf(Now-StartTime), ':', MilliSecondOf(Now-StartTime));
    ReadLn;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.
Cosmin Prund
  • 25,498
  • 2
  • 60
  • 104
  • With a stream you should call `WriteBuffer` rather than `Write`. Let `WriteBuffer` call `Write` and then handle any errors by raising. Or call `Write` if you must, but add the error handling. – David Heffernan Jan 30 '13 at 11:37
  • @DavidHeffernan, for this particular case `WriteBuffer` manages to be 3% slower then plain `Write`! None the less `WriteBuffer` will be: better get an error then have things silently fail. – Cosmin Prund Jan 30 '13 at 11:43
  • You can always call `Write` if you wish and do the error checking yourself. That might be faster, but 3% isn't much is it? – David Heffernan Jan 30 '13 at 11:56
  • @CosminPrund Thank you for this answer. This information is helpful. What does myString[1] return? – Trojanian Jan 31 '13 at 10:48
  • `myString[1]` is the first character in the string. – David Heffernan Jan 31 '13 at 10:50
  • 1
    @Trojanian, normally `myString[1]` would return the first character in the string. In this particular case, since it's passed as the parameter of a function that takes an untyped `var` parameter, it essentially passes a pointer to the part of the string that contains the actual data (there's a hidden part to the string for reference counting and other metadata that shouldn't concern you right now). – Cosmin Prund Jan 31 '13 at 11:07
  • 1
    If you were to pass `myString` as a parameter, since the function's parameter is of type `var`, you'd actually pass a pointer to a pointer to the string metadata; Not a mistake, `string` variables are actually pointers, but compiler magic makes them look like Values. When dealing with the internals of string (as in this particular case) this metaphor fails. – Cosmin Prund Jan 31 '13 at 11:12