2

I am using Delphi 7, Windows 7 and Absolute Database.

Quick Background. I work for a charity shop that relies on items being donated to us. To reclaim 'Gift Aid' from our HMRC we have to submit details of all sales together with the name and address of each donator of that sale. We help people with Special Needs so accurate data input is important.

Up to now to check Post Code verification was easy (based on our local area) basically the format of AA00_0AA or AA0_0AA. As our name has become better known not all Post Codes follow these rules.

I have access to UK's Royal Mail database for addresses in the UK si I thought to actually compare the inputted Post Code with a real Post Code. The RM csv file is huge so I use GSplit3 to break it down into more manageable files. This leaves me with 492 csv files each consisting of about 62000 lines. Note that I am only interested in the Post Codes so there is massive duplication.

To load these files into a dataset (without duplication) I first loaded the file names into a listbox and ran a loop to iterate through all the files to copy to the dataser.To avoid duplication I tried putting an unique index on the field but even running outside of Delphi I still got an error message about duplication. I then tried capturing the text of the last record to be appended and then compare it with the next record

   procedure TForm1.importClick(Sender: TObject);
    var
     i,y:Integer;
     lstfile:string;
    begin
      for i:= 0 to ListBox1.Items.Count-1 do
        begin
          lstfile:='';
          cd.Active:=False;//cd is a csv dataset loaded with csv file
          cd.FileName:='C:\paf 112018\CSV PAF\'+ListBox1.Items[i];
          cd.Active:=True;
          while not cd.Eof do
            begin
              if (lstfile:=cd.Fields[0].AsString=cd.Fields[0].AsString) then cd.Next 
            else
              table1.append;
               table1.fields[0].asstring:=cd.Fields[0].AsString;
               lstfile:=cd.Fields[0].AsString;
               cd.Next;
            end;
         end;
   table1.Edit;
   table1.Post;
   end;

This seemed to work OK although the total number of records in the dataset seemed low. I checked with my own Post Code and it wasn't there although another post Code was located. So obviously records had been skipped. I then tried loading the CSV file into a string list with dupignore then copying the stringlist to the dataset.

        unit Unit1;

        interface

        uses
          Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
          Dialogs, Grids, DBGrids, SMDBGrid, StdCtrls, DB, ABSMain, SdfData;

        type
          TForm1 = class(TForm)
            cd: TSdfDataSet;
            dscd: TDataSource;
            dst: TDataSource;
            ABSDatabase1: TABSDatabase;
            table1: TABSTable;
            table1PostCode: TStringField;
            Label2: TLabel;
            ListBox1: TListBox;
            getfiles: TButton;
            import: TButton;
            procedure getfilesClick(Sender: TObject);
            procedure importClick(Sender: TObject);

          private
            { Private declarations }
          public
            { Public declarations }
          end;

        var
          Form1: TForm1;
          num:Integer;
        implementation

        {$R *.dfm}
        procedure ListFileDir(Path: string; FileList: TStrings);
        var
          SR: TSearchRec;
        begin
          if FindFirst(Path + '*.csv', faAnyFile, SR) = 0 then
             begin
              repeat
              if (SR.Attr <> faDirectory) then
                begin
                  FileList.Add(SR.Name);
                end;
              until FindNext(SR) <> 0;
              FindClose(SR);
             end;
        end;

        procedure TForm1.getfilesClick(Sender: TObject);
        begin//Fill listbox with csv files
         ListFileDir('C:\paf 112018\CSV PAF\', ListBox1.Items);
        end;

          //start to iterate through files to appane to dataset
        procedure TForm1.importClick(Sender: TObject);
        var
          i,y:Integer;
          myl:TStringList;
        begin

         for i:= 0 to ListBox1.Items.Count-1 do
          begin
            myl:=TStringList.Create;
            myl.Sorted:=True;
            myl.Duplicates:=dupIgnore;
            cd.Active:=False;
            cd.FileName:='C:\paf 112018\CSV PAF\'+ListBox1.Items[i];
            cd.Active:=True;
            while not cd.Eof do
              begin
               if (cd.Fields[0].AsString='')then cd.Next 
              else
                myl.Add(cd.Fields[0].AsString);
                cd.Next;
              end;
            for y:= 0 to myl.Count-1 do
              begin
               table1.Append;
               table1.Fields[0].AsString:=myl.Strings[y];
              end;
          myl.Destroy;
          end;
          t.Edit;
          t.Post;
        end;


        procedure TForm1.Button1Click(Sender: TObject);
        begin
        t.Locate('Post Code',edit1.text,[]);
        end;

        procedure TForm1.Button2Click(Sender: TObject);
        var
          sel:string;
        begin
        q.Close;
        q.SQL.Clear;
        q.SQL.Add('Select * from postc where [Post Code] like :sel');
        q.ParamByName('sel').AsString:=Edit1.Text;
        q.Active:=True;
        end;

        end.

This starts well but soon starts to slow I presume because of memory leaks, I have tried myl.free(), freeandnil(myl) and finally destroy but they all slow down quickly. I am not an expert but do enjoy using Delphi and normally manage to solve problems through your pages or googling but this time I am stumped. Can anyone suggest a better method please

dcs
  • 51
  • 7
  • I have said I am not an expert and I thought the only way to learn was by experimenting, As for the post I have found that if you append another record then the open record is automatically posted the final edit/post was for the last record in the file – dcs Mar 16 '19 at 10:20
  • Thank you for reply. Re edit followed by post I have found that after calling append/insert or edit tand doing the changes then calling post only I get an error message dataset not in edit or insert mode so I normally do it this way to avoid problems. Re dataset declarations basically both the Table and the CSV dataset just have the one stringfield each. While I appreciate your comments my original question was why is the loop slowing down – dcs Mar 16 '19 at 11:22
  • You might consider a different approach. The UK Government [supplies a regex](https://stackoverflow.com/a/164994/327083) for validating post codes([see p.6](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/488478/Bulk_Data_Transfer_-_additional_validation_valid_from_12_November_2015.pdf)). Also [related](https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation). – J... Mar 16 '19 at 17:48
  • Thank you J. This is why I always read stackoverflow for the help that you get – dcs Mar 17 '19 at 05:33
  • Just keep in mind the caveats of the regex solution (it's all in the linked question). Still, it can be useful as a first step to at least check for correct format if you want something quick. You can validate against the proper list in a later step, if needed. – J... Mar 17 '19 at 09:25
  • @dcs Not related to your question but to your overall situation. Wouldn't it be better if you would be collecting the TAX number of your donators instead of their name and address? People and organizations can change their name and people quite commonly move somewhere else. This means that the information you collected could become out of date. Sure HMRC does have historical records so they can trace these changes. But wouldn't providing HMRC with TAX number of your donators make the whole process much easier since TAX numbers are unique and can't be changed once assigned ... – SilverWarior Mar 20 '19 at 21:33
  • ... Also by requesting TAX number you make sure that you are accepting the donation from a person or organization which would be to HMRC considered as acceptable donator with regards to "Gift Aid" regulations. – SilverWarior Mar 20 '19 at 21:36
  • @SilverWarrior. Thank you for your comments. HMRC specify the format claims should be submitted in, Using an .ods file we have to submit Initial, Surname, First line of Address, Post Code, Date item sold and the amount. One of our shops sell over 100 donated items a day. I would imagine that not 2 in the 100 would know their TAX number offhand. If we started to ask for that I expect people would just donate elsewhere. But thank you again – dcs Mar 22 '19 at 03:56

1 Answers1

2

The following shows how to add postcodes from a file containing a list of them to a query-type DataSet such as TAdoQuery or your q dataset (your q doesn't say what type it is, as far as I can see).

It also seems from what you say that although you treat your postcode file as a CVS file, you shouldn't actually need to: if the records contain only one field, there should be no need for commas, because there is nothing to separate, the file should contain simply one postcode per line. Consequently, there doesn't seem to be any need to incur the overhead of loading it as a CSV file, so you should be able simply to load it into a TStringList and add the postcodes from there.

I'm not going to attempt to correct your code, just show a very simple example of how I think it should be done. So the following code opens a postcode list file, which is assumed to contain one postcode per line, checks whether each entry in it already exists in your table of postcodes and adds it if not.

  procedure TForm1.AddPostCodes(const PostCodeFileName: String);
  //  The following shows how to add postcodes to a table of existing ones
  //  from a file named PostCodeFileName which should include an explicit path
  var
    PostCodeList : TStringList;
    PostCode : String;
    i : Integer;
  begin
    PostCodeList := TStringList.Create;
    try
      PostCodeList.LoadFromFile(PostCodeFileName);
      if qPostCodes.Active then
        qPostCodes.Close;
      qPostCodes.Sql.Text := 'select * from postcodes order by postcode';

      qPostCodes.Open;
      try
        qPostCodes.DisableControls;  //  Always call DisableControls + EnableControls
        //  when iterating a dataset which has db-aware controls connected to it
        for i := 0 to PostCodeList.Count - 1 do begin
          PostCode := PostCodeList[i]; //  use of PostCode local variable is to assist debuggging
          if not qPostCodes.Locate('PostCode', PostCode, [loCaseInsensitive]) then
            qPostCodes.InsertRecord([PostCode]);  //  InsertRecord does not need to be foollowed by a call to Post.
        end;
      finally
        qPostCodes.EnableControls;
        qPostCodes.Close;
      end;
    finally
      PostCodeList.Free;  //  Don't use .Destroy!
    end;
  end;

Btw, regarding the comment about bracketing an iteration of a dataset by calls to DisableControls and EnableControls, the usual reason for doing this is to avoid the overheaad of updating the gui's display of any db-aware controls connected to the dataset. However, one of the reasons I'm not willing to speculate what is causing your slowdown is that TAdoQuery, which is one of the standard dataset types that codes with Delphi benefits massively from DisableControls/EnableControls even when there are NO db-aware controls connected to it. This is because of a quirk in the coding of TAdoQuery. Whatever dataset you are using may have a similar quirk.

MartynA
  • 30,454
  • 4
  • 32
  • 73
  • Thank you very much for your example, The reason I load it as a csv file is that the original files have 17 fields but I am only interested in the first field hence fields[0]. Ignore the query part of my code that was just a quick addition to test the result, – dcs Mar 16 '19 at 15:02
  • 1
    Glad to help. Anyway, personally, I would still treat the step of extracting the postcodes from the CSV file as a separate step, loading the extracted postcodes into a TStringList before adding them to the db. Because the TStringList is in-memory, the overhead would be minimal but it would make it far easier to pinpoint where the bottleneck, if any, is. – MartynA Mar 16 '19 at 15:07