1

I read UTF8 content from xml files and then need to save and re-load on demand. I'm converting from AssignFile/Writeln/Readln to Buffered streams by David Heffernan: Buffered files (for faster disk access)

I have simple new WriteLn and ReadLn procedures, WriteLn works, but I can't make ReadLn work.

My concept for ReadLn is to process:

  1. Read buffer
  2. Find Line break
  3. Get Text from PrevPos to CurrPos-1
  4. Save rest of buffer to add to first line of next Read buffer

New WriteLn procedure:

{ * New WriteLn * }
procedure TForm1.Button2Click(Sender: TObject);
var FileOut: TWriteCachedFileStream;
  vText: string;
  vUTF8Text: RawByteString;
begin
  FileOut := TWriteCachedFileStream.Create('c:\tmp\file.txt');
  try
    vText := 'Delphi';
    vUTF8Text := Utf8Encode(vText + sLineBreak);
    FileOut.WriteBuffer(PAnsichar(vUTF8Text)^, Length(vUTF8Text));
    vText := 'VB源码';
    vUTF8Text := Utf8Encode(vText + sLineBreak);
    FileOut.WriteBuffer(PAnsichar(vUTF8Text)^, Length(vUTF8Text));
    vText := 'Java源码';
    vUTF8Text := Utf8Encode(vText + sLineBreak);
    FileOut.WriteBuffer(PAnsichar(vUTF8Text)^, Length(vUTF8Text));
  finally
    FileOut.Free;
  end;
end;

But have problems with Read because it fails to Read to buffer from file. Error occurs in Read function of TReadOnlyCachedFileStream:

Error says:

"Project Project1.exe raised exception class $C0000005 with message 'access violation at 0x004069ca: write of address 0x00000010'."


function TReadOnlyCachedFileStream.Read(var Buffer; Count: Longint): Longint;
begin
  ...
  Move(CachePtr^, BufferPtr^, NumOfBytesToCopy); { <- Error occurs here }
  ...
end;

And here is my ReadLn procedure - not working as I can't get through the error:

{ * New ReadLn * }
procedure TForm1.Button3Click(Sender: TObject);
var FileIn: TReadOnlyCachedFileStream;
  vLinesCounter, vCurrPos, vPrevPos: integer;
  vBuffer: TBytes;
  vUTF8Text, vPrevUTF8Text: string;
  vFilesize,vBytesRead,vNumberOfBytes: Int64;
  vCh: Char;
begin
  vLinesCounter := 0;
  FileIn := TReadOnlyCachedFileStream.Create('c:\tmp\file.txt');
  try
    vFilesize := FileIn.Size;
    while FileIn.Position < vFilesize do
    begin
      vBytesRead:=FileIn.Read(vBuffer, 65536);
      vNumberOfBytes := vNumberOfBytes + vBytesRead;
      {1. Find Line break
       2. Get Text from PrevPos to CurrPos-1
       3. Save rest of buffer to add to first line of next Read buffer}
      vCurrPos := 0; vPrevPos := 0;
      while vCurrPos < vBytesRead do
      begin
        vCh:=Chr(vBuffer[vCurrPos]);
        if (vCh = #13) Or (vCh = #10) then { is New line }
        begin
          if vPrevUTF8Text <> '' then
            vUTF8Text := vPrevUTF8Text + TEncoding.UTF8.GetString(vBuffer, vPrevPos, vCurrPos - 1) { Add previous text that was not separet line}
          else
            vUTF8Text := TEncoding.UTF8.GetString(vBuffer, vPrevPos, vCurrPos - 1);
          vPrevPos := vCurrPos; { Save Pos for next line }
          Inc(vLinesCounter);
          Memo1.Lines.Add(vUTF8Text);
        end;
      end;
      { save rest of text as start of next line }
      if vCurrPos < Length(vBuffer) then
        vPrevUTF8Text := TEncoding.UTF8.GetString(vBuffer, vPrevPos, vCurrPos - 1);
    end;
  finally
    FileIn.Free
  end;
  Memo1.Lines.Add('Lines read: '+IntToStr(vLinesCounter));
end;
Community
  • 1
  • 1
Mike Torrettinni
  • 1,816
  • 2
  • 17
  • 47
  • 1
    The error is because your call to Read is wrong. You aren't allocating a buffer and even if you were you'd be reading into the pointer to the buffer rather than the buffer itself. Allocate the buffer with SetLength(voucher, 65536) and pass Pointer(vBuffer)^ when calling Read. – David Heffernan Aug 02 '16 at 07:32
  • Beyond that is Memo1.Lines.LoadFromFile really that slow? – David Heffernan Aug 02 '16 at 07:35
  • Memo1 is here just for verifying what is being read, not to be used as main file loading method. OK, now it works with `SetLength` and `Pointer(vBuffer)^`. – Mike Torrettinni Aug 02 '16 at 09:23

1 Answers1

5

The RTL has its own TStreamReader and TStreamWriter classes in the System.Classes unit, you should let them do the hard work for you, eg:

procedure TForm1.Button2Click(Sender: TObject);
var
  FileOut: TStreamWriter;
begin
  FileOut := TStreamWriter.Create('c:\tmp\file.txt', False, TEncoding.UTF8);
  try
    FileOut.WriteLine('Delphi');
    FileOut.WriteLine('VB源码');
    FileOut.WriteLine('Java源码');
  finally
    FileOut.Free;
  end;
end;

procedure TForm1.Button3Click(Sender: TObject);
var
  FileIn: TStreamReader;
  vLinesCounter: Integer;
begin
  vLinesCounter := 0;
  FileIn := TStreamReader.Create('c:\tmp\file.txt', True);
  try
    while not FileIn.EndOfStream do
    begin
      Memo1.Lines.Add(FileIn.ReadLine);
      Inc(vLinesCounter);
    end;
  finally
    FileIn.Free;
  end;
  Memo1.Lines.Add('Lines read: '+IntToStr(vLinesCounter));
end;

If you want to use David's buffer classes (note that Delphi 10.1 Berlin adds a new TBufferedFileStream class), you can still do that as well, eg:

procedure TForm1.Button2Click(Sender: TObject);
var
  FileStrm: TWriteCachedFileStream;
  FileOut: TStreamWriter;
begin
  FileStrm := TWriteCachedFileStream.Create('c:\tmp\file.txt');
  try
    FileOut := TStreamWriter.Create(FileStrm, TEncoding.UTF8);
    try
      FileOut.WriteLine('Delphi');
      FileOut.WriteLine('VB源码');
      FileOut.WriteLine('Java源码');
    finally
      FileOut.Free;
    end;
  finally
    FileStrm.Free;
  end;
end;

procedure TForm1.Button3Click(Sender: TObject);
var
  FileStrm: TReadOnlyCachedFileStream;
  FileIn: TStreamReader;
  vLinesCounter: Integer;
begin
  vLinesCounter := 0;
  FileStrm := TReadOnlyCachedFileStream.Create('c:\tmp\file.txt');
  try
    FileIn := TStreamReader.Create(FileStrm, True);
    try
      while not FileIn.EndOfStream do
      begin
        Memo1.Lines.Add(FileIn.ReadLine);
        Inc(vLinesCounter);
      end;
    finally
      FileIn.Free;
    end;
  finally
    FileStrm.Free;
  end;
  Memo1.Lines.Add('Lines read: '+IntToStr(vLinesCounter));
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • My files get up to 500 MB, so speed is important and I assumed David's Buffered streams are really fast, faster than TStreamWriter. – Mike Torrettinni Aug 02 '16 at 00:12
  • 3
    @MikeTorrettinni Don't make assumptions about performance - test! – Disillusioned Aug 02 '16 at 00:28
  • 1
    `TStreamReader` and `TStreamWriter` have an internal buffer, so even if the underlying `TStream` is not using buffered I/O, buffering is still being used. And the constructors allow you to specify the buffer size, the default is 1KB. – Remy Lebeau Aug 02 '16 at 00:35
  • 2
    Stream reader/writer classes are poorly implemented though and the performance is not what it should be. I seem to recall writing my own variants to read files on a line by line basis. – David Heffernan Aug 02 '16 at 06:30
  • 1
    Not readily at the moment. I'm away from my computer for the next week. – David Heffernan Aug 02 '16 at 09:26
  • @RemyLebeau Thank you, this works. I'm testing on some of the example files. – Mike Torrettinni Aug 02 '16 at 12:37