12

i want to process a text file line by line. In the olden days i loaded the file into a StringList:

slFile := TStringList.Create();
slFile.LoadFromFile(filename);

for i := 0 to slFile.Count-1 do
begin
   oneLine := slFile.Strings[i];
   //process the line
end;

Problem with that is once the file gets to be a few hundred megabytes, i have to allocate a huge chunk of memory; when really i only need enough memory to hold one line at a time. (Plus, you can't really indicate progress when you the system is locked up loading the file in step 1).

The i tried using the native, and recommended, file I/O routines provided by Delphi:

var
   f: TextFile;
begin
   Reset(f, filename);
   while ReadLn(f, oneLine) do
   begin
       //process the line
   end;

Problem withAssign is that there is no option to read the file without locking (i.e. fmShareDenyNone). The former stringlist example doesn't support no-lock either, unless you change it to LoadFromStream:

slFile := TStringList.Create;
stream := TFileStream.Create(filename, fmOpenRead or fmShareDenyNone);
   slFile.LoadFromStream(stream);
stream.Free;

for i := 0 to slFile.Count-1 do
begin
   oneLine := slFile.Strings[i];
   //process the line
end;

So now even though i've gained no locks being held, i'm back to loading the entire file into memory.

Is there some alternative to Assign/ReadLn, where i can read a file line-by-line, without taking a sharing lock?

i'd rather not get directly into Win32 CreateFile/ReadFile, and having to deal with allocating buffers and detecting CR, LF, CRLF's.

i thought about memory mapped files, but there's the difficulty if the entire file doesn't fit (map) into virtual memory, and having to maps views (pieces) of the file at a time. Starts to get ugly.

i just want Reset with fmShareDenyNone!

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219

7 Answers7

16

With recent Delphi versions, you can use TStreamReader. Construct it with your file stream, and then call its ReadLine method (inherited from TTextReader).

An option for all Delphi versions is to use Peter Below's StreamIO unit, which gives you AssignStream. It works just like AssignFile, but for streams instead of file names. Once you've used that function to associate a stream with a TextFile variable, you can call ReadLn and the other I/O functions on it just like any other file.

Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
  • 3
    TStreamReader would be great if it weren't *so bloody slow*. Get Uffe Kousgaard's [Text reading "benchmark"](http://cc.embarcadero.com/Item/27692) from CodeCentral and add in a TStreamReader implementation. Run it and watch your CPU burn. It's not even I/O bound. – afrazier May 12 '10 at 12:49
  • 1
    I wrote a really nice alternative that is REALLY FAST. It's built into the source code for the TJvCsvDataSet in the Jedi JVCL. – Warren P Apr 06 '11 at 21:36
  • 4
    Please use words, Ian, not just links. I don't know what you're trying to communicate otherwise. – Rob Kennedy Apr 22 '12 at 21:33
4

You can use this sample code:

TTextStream = class(TObject)
      private
        FHost: TStream;
        FOffset,FSize: Integer;
        FBuffer: array[0..1023] of Char;
        FEOF: Boolean;
        function FillBuffer: Boolean;
      protected
        property Host: TStream read FHost;
      public
        constructor Create(AHost: TStream);
        destructor Destroy; override;
        function ReadLn: string; overload;
        function ReadLn(out Data: string): Boolean; overload;
        property EOF: Boolean read FEOF;
        property HostStream: TStream read FHost;
        property Offset: Integer read FOffset write FOffset;
      end;

    { TTextStream }

    constructor TTextStream.Create(AHost: TStream);
    begin
      FHost := AHost;
      FillBuffer;
    end;

    destructor TTextStream.Destroy;
    begin
      FHost.Free;
      inherited Destroy;
    end;

    function TTextStream.FillBuffer: Boolean;
    begin
      FOffset := 0;
      FSize := FHost.Read(FBuffer,SizeOf(FBuffer));
      Result := FSize > 0;
      FEOF := Result;
    end;

    function TTextStream.ReadLn(out Data: string): Boolean;
    var
      Len, Start: Integer;
      EOLChar: Char;
    begin
      Data:='';
      Result:=False;
      repeat
        if FOffset>=FSize then
          if not FillBuffer then
            Exit; // no more data to read from stream -> exit
        Result:=True;
        Start:=FOffset;
        while (FOffset<FSize) and (not (FBuffer[FOffset] in [#13,#10])) do
          Inc(FOffset);
        Len:=FOffset-Start;
        if Len>0 then begin
          SetLength(Data,Length(Data)+Len);
          Move(FBuffer[Start],Data[Succ(Length(Data)-Len)],Len);
        end else
          Data:='';
      until FOffset<>FSize; // EOL char found
      EOLChar:=FBuffer[FOffset];
      Inc(FOffset);
      if (FOffset=FSize) then
        if not FillBuffer then
          Exit;
      if FBuffer[FOffset] in ([#13,#10]-[EOLChar]) then begin
        Inc(FOffset);
        if (FOffset=FSize) then
          FillBuffer;
      end;
    end;

    function TTextStream.ReadLn: string;
    begin
      ReadLn(Result);
    end;

Usage:

procedure ReadFileByLine(Filename: string);
var
  sLine: string;
  tsFile: TTextStream;
begin
  tsFile := TTextStream.Create(TFileStream.Create(Filename, fmOpenRead or    fmShareDenyWrite));
  try
    while tsFile.ReadLn(sLine) do
    begin
      //sLine is your line
    end;
  finally
    tsFile.Free;
  end;
end;
Linas
  • 5,485
  • 1
  • 25
  • 35
3

As it seems the FileMode variable is not valid for Textfiles, but my tests showed that multiple reading from the file is no problem. You didn't mention it in your question, but if you are not going to write to the textfile while it is read you should be good.

Uwe Raabe
  • 45,288
  • 3
  • 82
  • 130
  • -1. Even for non-text-files, all but the lower two bits of `FileMode` get *masked out* when you call `Reset`, so the sharing flags are ignored then, too. – Rob Kennedy May 12 '10 at 08:19
  • 1
    Did you really try it? I made a simple application that opens a textfile with fmOpenRead + fmShareDenyWrite, reads one line with each button click and adds it to a TMemo. I can execute the app two times and read the file simultanously. In addition writing to the file is prohibited. If someone is interested I can edit my answer to include the relevant sourcecode. BTW, tested with D2010. – Uwe Raabe May 12 '10 at 20:41
  • I just made another test: it works even without fmShareDenyWrite. The only drawback I encountered so far is that it seems impossible to write to the file while it is open for reading (even with fmShareDenyNone), but reading from multiple processes seems no problem. – Uwe Raabe May 12 '10 at 20:48
3

If you need support for ansi and Unicode in older Delphis, you can use my GpTextFile or GpTextStream.

gabr
  • 26,580
  • 9
  • 75
  • 141
2

What I do is use a TFileStream but I buffer the input into fairly large blocks (e.g. a few megabytes each) and read and process one block at a time. That way I don't have to load the whole file at once.

It works quite quickly that way, even for large files.

I do have a progress indicator. As I load each block, I increment it by the fraction of the file that has additionally been loaded.

Reading one line at a time, without something to do your buffering, is simply too slow for large files.

lkessler
  • 19,819
  • 36
  • 132
  • 203
1

I had same problem a few years ago especially the problem of locking the file. What I did was use the low level readfile from the shellapi. I know the question is old since my answer (2 years) but perhaps my contribution could help someone in the future.

const
  BUFF_SIZE = $8000;
var
  dwread:LongWord;
  hFile: THandle;
  datafile : array [0..BUFF_SIZE-1] of char;

hFile := createfile(PChar(filename)), GENERIC_READ, FILE_SHARE_READ or FILE_SHARE_WRITE, nil, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, 0);
SetFilePointer(hFile, 0, nil, FILE_BEGIN);
myEOF := false;
try
  Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);   
  while (dwread > 0) and (not myEOF) do
  begin
    if dwread = BUFF_SIZE then
    begin
      apos := LastDelimiter(#10#13, datafile);
      if apos = BUFF_SIZE then inc(apos);
      SetFilePointer(hFile, aPos-BUFF_SIZE, nil, FILE_CURRENT);
    end
    else myEOF := true;
    Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
  end;
finally
   closehandle(hFile);
end;

For me the speed improvement appeared to be significant.

HpTerm
  • 8,151
  • 12
  • 51
  • 67
0

Why not simply read the lines of the file directly from the TFileStream itself one at a time ?

i.e. (in pseudocode):

  readline: 
    while NOT EOF and (readchar <> EOL) do
      appendchar to result


  while NOT EOF do
  begin
    s := readline
    process s
  end;

One problem you may find with this is that iirc TFileStream is not buffered so performance over a large file is going to be sub-optimal. However, there are a number of solutions to the problem of non-buffered streams, including this one, that you may wish to investigate if this approach solves your initial problem.

Deltics
  • 22,162
  • 2
  • 42
  • 70
  • The reason i don't really want to do that is because it's not easy to get right. For example, your pseudo-code has 3 subtle bugs. And so rather than re-invent a buggy wheel, i'd rather use canned, tested, code. – Ian Boyd May 12 '10 at 11:40
  • 1
    HOw can it include bugs? It's pseudo-code intended to illustrate an idea, not REAL code!! How you implement the real code will determine whether it contains bugs or not. You want to process a file *whilst* reading if from the disc, rather than processing after reading the entire content, then processing it while STREAMing is exactly what you need (you will note that all other answers are variations on this theme!). If you already had an idea of what answer you wanted to hear, why bother even asking the question? – Deltics May 12 '10 at 21:01
  • pseudo-code is used to show an algorithm, without the nuisance of dealing with a particular language. In this case the algorithm is flawed. – Ian Boyd Jun 02 '10 at 13:43
  • 1
    I'm sorry, but if you need to be spoon-fed complete working code even in pseudo-code form in a forum such as this then imho you should be looking for another job. Software development clearly isn't your thing (and if you're good enough to spot the subtle logical flaws, then you're good enough to write the REAL code without those flaws). FFS – Deltics Jun 02 '10 at 21:03
  • 1
    The only reason i was able to spot the bugs at the time was that i had spent hours on the problem. If i were to write it today i would get it wrong. And i don't want to get the code wrong; i'd rather use a trusted piece of highly tested code. (i.e. why reinvent the wheel). i think part of being a good programmer is recognizing problems before they happen. – Ian Boyd Mar 17 '11 at 14:08