11

I want to read the entire table from an MS Access file and I'm trying to do it as fast as possible. When testing a big sample I found that the loop counter increases faster when it's reading the top records comparing to last records of the table. Here's a sample code that demonstrates this:

procedure TForm1.Button1Click(Sender: TObject);
const
  MaxRecords = 40000;
  Step = 5000;
var
  I, J: Integer;
  Table: TADOTable;
  T: Cardinal;
  Ts: TCardinalDynArray;
begin
  Table := TADOTable.Create(nil);
  Table.ConnectionString :=
    'Provider=Microsoft.ACE.OLEDB.12.0;'+
    'Data Source=BigMDB.accdb;'+
    'Mode=Read|Share Deny Read|Share Deny Write;'+
    'Persist Security Info=False';
  Table.TableName := 'Table1';
  Table.Open;

  J := 0;
  SetLength(Ts, MaxRecords div Step);
  T := GetTickCount;
  for I := 1 to MaxRecords do
  begin
    Table.Next;
    if ((I mod Step) = 0) then
    begin
      T := GetTickCount - T;
      Ts[J] := T;
      Inc(J);
      T := GetTickCount;
    end;
  end;
  Table.Free;

//  Chart1.SeriesList[0].Clear;
//  for I := 0 to Length(Ts) - 1 do
//  begin
//    Chart1.SeriesList[0].Add(Ts[I]/1000, Format(
//      'Records: %s %d-%d %s Duration:%f s',
//      [#13, I * Step, (I + 1)*Step, #13, Ts[I]/1000]));
//  end;
end;

And the result on my PC: enter image description here

The table has two string fields, one double and one integer. It has no primary key nor index field. Why does it happen and how can I prevent it?

MartynA
  • 30,454
  • 4
  • 32
  • 73
saastn
  • 5,717
  • 8
  • 47
  • 78
  • No. I'm creating the control programmatically, there's nothing more than what you can see in the sample code. – saastn Nov 27 '15 at 21:34
  • Isn't your For loop off by one? Anyway, are you surprised that if you read a lot of records, it involves a lot of memory allocations, and that these take longer the more memory gets allocated? – MartynA Nov 27 '15 at 22:59
  • @MartynA You're right about the loop. But I can't say that it's memory allocation that makes it slower. Seems like it fetches all records in `Table.Open`, Task Manager shows no memory allocation after running that line. – saastn Nov 27 '15 at 23:30
  • Have you tried iterating through it using `while not Table.EOF...`? – Jerry Dodge Nov 27 '15 at 23:38
  • @JerryDodge I checked it right now, but same results. – saastn Nov 27 '15 at 23:56

3 Answers3

20

I can reproduce your results using an AdoQuery with an MS Sql Server dataset of similar size to yours.

However, after doing a bit of line-profiling, I think I've found the answer to this, and it's slightly counter-intuitive. I'm sure everyone who does DB programming in Delphi is used to the idea that looping through a dataset tends to be much quicker if you surround the loop by calls to Disable/EnableControls. But who would bother to do that if there are no db-aware controls attached to the dataset?

Well, it turns out that in your situation, even though there are no DB-aware controls, the speed increases hugely if you use Disable/EnableControls regardless.

The reason is that TCustomADODataSet.InternalGetRecord in AdoDB.Pas contains this:

      if ControlsDisabled then
        RecordNumber := -2 else
        RecordNumber := Recordset.AbsolutePosition;

and according to my line profiler, the while not AdoQuery1.Eof do AdoQuery1.Next loop spends 98.8% of its time executing the assignment

        RecordNumber := Recordset.AbsolutePosition;

! The calculation of Recordset.AbsolutePosition is hidden, of course, on the "wrong side" of the Recordset interface, but the fact that the time to call it apparently increases the further you go into the recordset makes it reasonable imo to speculate that it's calculated by counting from the start of the recordset's data.

Of course, ControlsDisabled returns true if DisableControls has been called and not undone by a call to EnableControls. So, retest with the loop surrounded by Disable/EnableControls and hopefully you'll get a similar result to mine. It looks like you were right that the slowdown isn't related to memory allocations.

Using the following code:

procedure TForm1.btnLoopClick(Sender: TObject);
var
  I: Integer;
  T: Integer;
  Step : Integer;
begin
  Memo1.Lines.BeginUpdate;
  I := 0;
  Step := 4000;
  if cbDisableControls.Checked then
    AdoQuery1.DisableControls;
  T := GetTickCount;
{.$define UseRecordSet}
{$ifdef UseRecordSet}
  while not AdoQuery1.Recordset.Eof do begin
    AdoQuery1.Recordset.MoveNext;
    Inc(I);
    if I mod Step = 0 then begin
      T := GetTickCount - T;
      Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
      T := GetTickCount;
    end;
  end;
{$else}
  while not AdoQuery1.Eof do begin
    AdoQuery1.Next;
    Inc(I);
    if I mod Step = 0 then begin
      T := GetTickCount - T;
      Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
      T := GetTickCount;
    end;
  end;
{$endif}
  if cbDisableControls.Checked then
    AdoQuery1.EnableControls;
  Memo1.Lines.EndUpdate;
end;

I get the following results (with DisableControls not called except where noted):

Using CursorLocation = clUseClient

AdoQuery.Next   AdoQuery.RecordSet    AdoQuery.Next 
                .MoveNext             + DisableControls

4000:157            4000:16             4000:15
8000:453            8000:16             8000:15
12000:687           12000:0             12000:32
16000:969           16000:15            16000:31
20000:1250          20000:16            20000:31
24000:1500          24000:0             24000:16
28000:1703          28000:15            28000:31
32000:1891          32000:16            32000:31
36000:2187          36000:16            36000:16
40000:2438          40000:0             40000:15
44000:2703          44000:15            44000:31
48000:3203          48000:16            48000:32

=======================================

Using CursorLocation = clUseServer

AdoQuery.Next   AdoQuery.RecordSet    AdoQuery.Next 
                .MoveNext             + DisableControls

4000:1031           4000:454            4000:563
8000:1016           8000:468            8000:562
12000:1047          12000:469           12000:500
16000:1234          16000:484           16000:532
20000:1047          20000:454           20000:546
24000:1063          24000:484           24000:547
28000:984           28000:531           28000:563
32000:906           32000:485           32000:500
36000:1016          36000:531           36000:578
40000:1000          40000:547           40000:500
44000:968           44000:406           44000:562
48000:1016          48000:375           48000:547

Calling AdoQuery1.Recordset.MoveNext calls directly into the MDac/ADO layer, of course, whereas AdoQuery1.Next involves all the overhead of the standard TDataSet model. As Serge Kraikov said, changing the CursorLocation certainly makes a difference and doesn't exhibit the slowdown we noticed, though obviously it's significantly slower than using clUseClient and calling DisableControls. I suppose it depends on exactly what you're trying to do whether you can take advantage of the extra speed of using clUseClient with RecordSet.MoveNext.

MartynA
  • 30,454
  • 4
  • 32
  • 73
  • Thank you very much, `DisableControls` worked for me. But, unlike your results, `clUseServer` is not slower than `clUseClient` here. Although, dataset doesn't return any records after setting `CursorLocation` to `clUseServer`, unless I set `LockType` to `ltReadOnly`. – saastn Nov 28 '15 at 22:20
  • @MartynA out of curiosity, which profiler did you use? – Christian Holm Jørgensen Nov 29 '15 at 21:44
  • @ChristianHolmJørgensen: I used the line profiler of Nexus Quality Suite (www.nexusdb.com) which is a reincarnation of the old Turbopower product of a similar name. – MartynA Nov 29 '15 at 21:48
1

When you open a table, ADO dataset internally creates special data structures to navigate dataset forward/backward - "dataset CURSOR". During navigation, ADO stores the list of already visited records to provide bidirectional navigation.
Seems ADO cursor code uses quadratic-time O(n2) algorithm to store this list.
But there are workaround - use server-side cursor:

Table.CursorLocation := clUseServer;  

I tested your code using this fix and get linear fetch time - fetching every next chunk of records takes the same time as previous.

PS Some other data access libraries provides special "unidirectional" datasets - this datasets can traverse only forward and don't even store already traversed records - you get constant memory consumption and linear fetch time.

1

DAO is native to Access and (IMHO) is typically faster. Whether or not you switch, use the GetRows method. Both DAO and ADO support it. There is no looping. You can dump the entire recordset into an array with a couple of lines of code. Air code: yourrecordset.MoveLast yourrecordset.MoveFirst yourarray = yourrecordset.GetRows(yourrecordset.RecordCount)

AVG
  • 1,317
  • 8
  • 12
  • Maybe, but the OP is asking about Delphi code, and in Delphi, you don't typically work of arrays of db records. – MartynA Nov 28 '15 at 13:59
  • Thanks MartynA. I know nothing about Delphi, but just thought it might have similar structures to other languages. – AVG Nov 28 '15 at 19:15
  • Well, it *can* have them (just by declaring an array of suitable type) but it's just not the "Delphi" way of doing things. The point is, in Delphi, all supported dataset types are descendants of one ancestor (TDataset) which contains a generalised model of a dataset with movable logical cursor. And all its db-aware controls are designed to interact with this model, rather than arrays. A consequence is that all its db-aware controls work with any supported TDataset descendant. – MartynA Nov 28 '15 at 19:27