0

I have a Field named "refFile" which may or may not have a Path description in it.

I need to go through the entire database and check whether Paths that are defined in "refFile" still actually exist.

Using Delphi (Pascal) this takes many, many minutes

bigDB.First;
while not(bigDB.EOF) do
begin 
  if Trim(bigDB.FieldByName('refFile').AsString) > '' then
  begin
    if not(FileExists(bigDB.FieldByName('refFile').AsString)) then
    begin
      bigDB.Edit;
      bigDB.FieldByName('refFile').AsString:='';
      bigDB.Post;
    end;
  end;
  bigDB.Next;
end;   

How do I do that in SQL?

Thank you.

LinuxFerLife
  • 199
  • 7
  • What is the component that you use? FireDAC, ADO, dbExpress..? First thing to improve is caching field references (get rid of repetitive asking for field by name). Next merely depends, are all the files under the same directory? Or anywhere on any drive? I mean, is it worth building a file tree for this operation? Next might be fast iteration over the dataset without moving the cursor. So as checking what's in your dataset, whether you're not storing and fetching blobs in this dataset for example. – Victoria Aug 27 '17 at 18:15
  • Why do you actually need to know this information "in forward"? File existence can change very often.. – Victoria Aug 27 '17 at 20:06

3 Answers3

2

You cannot check the validity of a path in SQLLite but you can filter records with something in the path and reduce the list of lines to check.

You can order the records on this field (if you have an index on it) and check only the paths you didn't checked before.

You also can use threads to do this long operation in background. Simply use TThread.Createanonymousthread(procedure begin end).Start;

  • the best solution is to filter the records in SQL (order + where) and in Delphi check only paths that haven't been checked before. using a thread permit to do the work background and don't freeze the application. – Patrick PREMARTIN Aug 27 '17 at 18:22
  • Don't worry, I won't :) I misundertood you, sorry. Hence I deleted my comment. Let's delete our conversation from here, please. What you suggest assumes that files are located in the same paths. That might not be so. Consider the worst case, each one in different path relative to the root. – Victoria Aug 27 '17 at 18:24
  • The first sentence is the answer to the question. – Sertac Akyuz Aug 27 '17 at 18:30
  • @Sertac, really? Consider the worst case when all the records will be pointing to unique paths relative to drive root. What you'll reduce when having just `C:\Folder1\File1.ext` and `C:\Folder2\File2.ext` records sorted? I'm not against that idea, hence I asked the OP _"are all the files under the same directory"_. – Victoria Aug 27 '17 at 18:34
  • @Victoria - I was not talking about that. The question is how to clear field data if the contents do not point to existing files by means of SQL. The answer is the database won't help with the files but it can help working on a subset. – Sertac Akyuz Aug 27 '17 at 19:21
  • @Sertac, first off, you cannot _"filter records with something in the path and reduce the list of lines to check"_ if you're about to check if all recorded file names exists. You can at most sort them, fetch to the client and start cracking their paths and there probably by `FindFirst(Next)` list files and check if exists. How that will perform in case when all the files will be in different directories relative to the root is hard to say (if pure `FileExists` won't be faster there). But it's the way that can help. Threads won't improve performance. To sum up, points 1 and 3 are not accurate. – Victoria Aug 27 '17 at 19:34
  • @Victoria - What the answer means is that you can retrieve records having non-empty values for the path/file field. You can certainly do that and it will reduce the list of lines to check by the amount of non-retrieved records. The rest, I don't care - it's not asked. – Sertac Akyuz Aug 27 '17 at 20:11
  • @Sertac, aha, so you say that filtering for NULLs is the answer? Hmm, actually don't care is the reason why I feel a bit disappointed by the voters here (for example a few days ago I was trying to explain that [it's not possible to cast NULL to a value](https://stackoverflow.com/a/45826485/8041231); result score -2 and some delete vote). – Victoria Aug 27 '17 at 20:13
  • 3
    @Victoria - Exactly. My experience is that if you diverge from the question and try to solve the *actual* problem, the question gets lost and the asker (even so for a newcomer) gets the impression that you have to solve what's not asked. My opinion is that we should instead encourage asking better questions. – Sertac Akyuz Aug 27 '17 at 20:16
  • @Sertac, it seems you are right at this point (I've experienced some hates when trying to go deeper to Y when X was asked). I'll focus more on something else than this site then as I prefer forum like attitude :) – Victoria Aug 27 '17 at 20:18
  • 2
    @Victoria - FWIW, I hope you will consider not abandoning SO as I think you bring a lot of value via your inputs and comments. – Rohit Aug 28 '17 at 07:59
  • @Victoria The question is how to do it in SQL. The answer is "it's not possible in SQL language". I consider using a UDF is not better than working with Delphi. Getting only not empty path is a little optimization, but it is. Sorting the results permit to check paths (and files). When the path don't exists, it's not neccessary to check all files and subdirectories in this path. If the problem is to check files in existing paths, you have to check each file. It's time consuming. There is no other way than a loop and a fileexists. – Patrick PREMARTIN Aug 28 '17 at 08:47
  • @Victoria Thread won't accelerate the process and in this case parallelism can't be used (because of sharing ressources), but threads allow the user to continue using the program and don't generate "application freeze" alert from the operating system. – Patrick PREMARTIN Aug 28 '17 at 08:49
  • 1
    @Victoria: " FWIW, I hope you will consider not abandoning SO" Me, too, I think your contributions here are invaluable. – MartynA Aug 28 '17 at 11:20
  • 2
    @Victoria: Btw, I think this q and all the comments just goes to show that inadequate questions generate more heat than light. So, I've done what I wish more people had done in the first place, namely voted to close because in its context the q isn't clear enough to be answered properly. – MartynA Aug 28 '17 at 11:33
2

You can't check the existence of a file in a plain SQLLite query. You could do that by using an UDF (User defined function) but it would be a little more complex and would requires some skills in other programming languages (Note that in that case your files should be accessible from the server, otherwise it wouldn't work).

If you are looking for a simpler solution, I think you can speed up your program by reducing the number of records resulted by the query and by improving your Delphi code in order to make it a little more efficient.

Select SQL:

  • Use length and trim functions due to reduce the number of records to be verified by your Delphi code.

    select refFile 
    from myTable
    where (refFile is not null) and (length(trim(refFile)) > 0)
    

Delphi:

  • Call TDataSet.FieldByName only once.
  • Try using TDataSet.DisableControls and TDataSet.EnableControls (In this way, some dataset's components are faster, even if the dataset component is not linked to any control).

    var
      Fld : TField;
    begin
      BigDB.DisableControls();
      try
        Fld := BigDB.FieldByName('refFile');
    
        BigDB.First;
        while not(BigDB.Eof) do
        begin
          if not(FileExists(Fld.AsString)) then
          begin
            BigDB.Edit;
            Fld.AsString := '';
            BigDB.Post;
          end;
          BigDB.Next;
        end;
      finally
        BigDB.EnableControls();
      end;
    

Furthermore, you could consider these other optimizations:

  • If the refFile field contains the same value multiple times, you could sort the query by the refFile field and change the Delphi code in order to verify each filename only once. (You can do that by storing the last value and the result of the FileExists function).
  • You can run your code asyncronusly by using the TThread class. In this way your application won't freeze and it could be faster.
Fabrizio
  • 7,603
  • 6
  • 44
  • 104
  • Think about it, normally a database engine does not even run on the client side. How can a UDF help filter client files at the server side? Though I doubt that this would be your downvote reason, it rather looks like a downvoter by nature has trespassed. – Sertac Akyuz Aug 28 '17 at 03:08
  • 1
    @SertacAkyuz: You're right, I've taken it for granted that files were also accessible from the server. I've updated the first part of the answer by specifying this fact. – Fabrizio Aug 28 '17 at 06:26
  • You can register UDF in Delphi as well. It's about importing a few C functions. Your client library just needs to provide handle to the opened database. – Victoria Aug 28 '17 at 14:17
  • 1
    @exd - Thanks. Also note the client side would have to store paths through a share. I.e. "c:\..something" wouldn't mean anything to the server if the database server is not on the same machine. – Sertac Akyuz Aug 28 '17 at 18:42
2

For example with FireDAC it's extremely easy to create user defined functions. If you're using it, try something like this. It could save some time because the engine doesn't need to fetch the resultset to the client application:

uses
  FireDAC.Phys.SQLiteWrapper;

type
  TForm1 = class(TForm)
    Button1: TButton;
    FDQuery1: TFDQuery;
    FDConnection1: TFDConnection;
    FDGUIxWaitCursor1: TFDGUIxWaitCursor;
    procedure FormCreate(Sender: TObject);
    procedure Button1Click(Sender: TObject);
  private
    FValidator: TSQLiteFunction;
    procedure ValidateFile(AFunc: TSQLiteFunctionData; AInputs: TSQLiteInputs;
      AOutput: TSQLiteOutput; var AUserData: TObject);
  end;

procedure TForm1.FormCreate(Sender: TObject);
begin
  FDConnection1.Open;

  FValidator := TSQLiteFunction.Create((TObject(FDConnection1.CliObj) as TSQLiteDatabase).Lib);
  FValidator.Args := 1;
  FValidator.Name := 'FileExists';
  FValidator.OnCalculate := ValidateFile;
  FValidator.InstallAll;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  FDQuery1.SQL.Text :=
    'UPDATE MyTable SET FileName = NULL WHERE ' +
    'FileName IS NOT NULL AND NOT FileExists(FileName)';
  FDQuery1.ExecSQL;
end;

procedure TForm1.ValidateFile(AFunc: TSQLiteFunctionData; AInputs: TSQLiteInputs;
  AOutput: TSQLiteOutput; var AUserData: TObject);
begin
  AOutput.AsBoolean := FileExists(AInputs[0].AsString);
end;

Or simply drop the TFDSQLiteFunction component, fill out the FunctionName property with name of the function, write OnCalculate event handler similar to the above and enable the component by setting the Active property.

Victoria
  • 7,822
  • 2
  • 21
  • 44
  • Purely speculating but I would guess this would have an additional slowing effect since for each record an additional round trip would be required to the client, assuming the database server is on a different box. – Sertac Akyuz Aug 28 '17 at 18:45
  • @Sertac, I assume local use only (sorry, this post is a guess of using FireDAC, so I didn't mention possibility of a remote machine use). With local use you may gain some speedup because fetching the whole file name resultset to the application, checking for existence and updating only non existing file tuples should be more expensive than a UDF callback. – Victoria Aug 28 '17 at 19:20