4

I have a binary file (2.5 MB) and I want to find position of this sequence of bytes: CD 09 D9 F5. Then I want to write some data after this position and also overwrite old data (4 KB) with zeros.

Here is how I do it now but it is a bit slow.

ProcessFile(dataToWrite: string);
var
  fileContent: string;
  f: file of char;
  c: char;
  n, i, startIndex, endIndex: integer;
begin
  AssignFile(f, 'file.bin');
  reset(f);
  n := FileSize(f);
  while n > 0 do
  begin
    Read(f, c);
    fileContent := fileContent + c;
    dec(n);
  end;
  CloseFile(f);

  startindex := Pos(Char($CD)+Char($09)+Char($D9)+Char($F5), fileContent) + 4;
  endIndex := startIndex + 4088;

  Seek(f, startIndex);

  for i := 1 to length(dataToWrite) do
    Write(f, dataToWrite[i]);

  c := #0;
  while (i < endIndex) do
  begin
    Write(f, c); inc(i);
  end;

  CloseFile(f);
end;
Alex P.
  • 3,697
  • 9
  • 45
  • 110
  • 4
    Which part of the code is slow? Have you performed timings? How do you even know it is slow? What speed is it, and what do you expect to be able to achieve? – David Heffernan Mar 27 '13 at 14:58
  • 5
    Its obvious what reading and writing file char-by-char is slow. At least fetch data into buffer by larger chunks (see BlockRead). – OnTheFly Mar 27 '13 at 15:12
  • @DavidHeffernan, Yes, part where it is searching for the sequence position is slow. Now it takes for about 15 sec for 5 files, I want it to be at most 1-3 sec. If I comment it and just set StartIndex to 9999 for example then it is instant. I think it is not the best solution to read all file content byte by byte as chars + copy it to string. – Alex P. Mar 27 '13 at 15:23
  • @DavidHeffernan, compile time. I know which part is slow, I wrote about that above - reading byte by byte to string and perhaps searching in that string. I hope blockread will solve my problem... – Alex P. Mar 27 '13 at 18:04
  • Oh good grief, I had not seen that you did that. That's absolutely crazy! I would abandon Pascal I/O. Use streams instead. – David Heffernan Mar 27 '13 at 18:07

2 Answers2

6

See this answer: Fast read/write from file in delphi

Some options are:

To search the file buffer, see Best way to find position in the Stream where given byte sequence starts - one answer mentions the Boyer-Moore algorithm for fast detection of a byte sequence.

Community
  • 1
  • 1
mjn
  • 36,362
  • 28
  • 176
  • 378
  • I would just read a block, scan it for the first byte, evaluate the rest, short-circuiting when appropriate. Dealing with the start of the sequence appearing at the end of a block seems to be the most obvious edge condition here. – Leonardo Herrera Mar 27 '13 at 17:19
3

Your code to read the entire file into a string is very wasteful. Pascal I/O uses buffering so I don't think it's the byte by byte aspect particularly. Although one big read would be better. The main problem will be the string concatenation and the extreme heap allocation demand required to concatenate the string, one character at a time.

I'd do it like this:

function LoadFileIntoString(const FileName: string): string;
var
  Stream: TFileStream;
begin
  Stream := TFileStream.Create(FileName, fmOpenRead);
  try
    SetLength(Result, Stream.Size);//one single heap allocation
    Stream.ReadBuffer(Pointer(Result)^, Length(Result));
  finally
    Stream.Free;
  end;
end;

That alone should make a big difference. When it comes to writing the file, a similar use of strings will be much faster. I've not attempted to decipher the writing part of your code. Writing the new data, and the block of zeros again should be batched up to as few separate writes as possible.

If ever you find that you need to read or write very small blocks to a file, then I offer you my buffered file streams: Buffered files (for faster disk access).

The code could be optimised further to read only a portion of the file, and search until you find the target. You may be able to avoid reading the entire file that way. However, I suspect that these changes will make enough of a difference.

Community
  • 1
  • 1
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490