1

I have a file with a fixed pattern (CONST_) and a running number (XXXX) like this: CONST_XXXX.XYZ.

I am looking for an efficient way to get the file with the highest number in Delphi. Traversing with FindFirst / FindNext seems to be inefficient, if there are many files.

Alois Heimer
  • 1,772
  • 1
  • 18
  • 40
  • 1
    What about searching for patterns like `CONST_9*`. If `FindFirst` finds a file, then you'll continue with `CONST_99*`. If no file with pattern `CONST_99XX` is found, then you'll continue to `CONST_98*` etc. – TLama Jan 30 '14 at 13:22
  • @TLama That would be a possible solution. But I was hoping there would be an easier approach, that I am not aware of. Something like a FindFirst combined with an ORDERBY :) – Alois Heimer Jan 30 '14 at 13:27
  • That would not be that hard to implement. The question is how much time takes the allocation for the `FindFirst` (`FindFirstFile` respectively) function call. – TLama Jan 30 '14 at 13:31
  • @AloisHeimer In order to order the file names, they have to be enumerated. – David Heffernan Jan 30 '14 at 13:32
  • To be clear, are you saying that all filenames include exactly 4 digits? – David Heffernan Jan 30 '14 at 14:06
  • @DavidHeffernan I am free to choose the system, so yes I can define, that all files have exactly 4 (or in reality better 8) digits. Then I would start with 00000001. I am just planning. If there is no good way to do this (ala for instance `FindLast`), I will look for a different solution to my problem. – Alois Heimer Jan 30 '14 at 14:25
  • I don't know the performance of FileExist but you could try a binary search pattern. If FileExist has the same performance as one findfirst/findnext call pattern then this approach is a good way to go. – mrabat Jan 30 '14 at 14:28

4 Answers4

4

It is well known that finding the maximum of a list, in general, requires all items to be checked. And I believe that the most efficient way to do that is to use FindFirstFile/FindNextFile or related APIs. It's hard to imagine that there will be any real way to improve on the official system API for enumerating files.

That was certainly the opinion offered here: Is there a faster alternative to enumerating folders than FindFirstFile/FindNextFile with C++? Note that I am rejecting out of hand the option of parsing the file system by hand. I don't regard that as being very practical.

On the other hand, this answer offers hope that FindFirstFileEx with FindExInfoBasic and FIND_FIRST_EX_LARGE_FETCH may lead to better performance than plain old FindFirstFile.

You may need to look for an alternative solution to your problem, one that does not involve repeated enumerations of a directory full of files. Perhaps using a database so that you can take advantage of indexing. In fact, it is plausible that the built-in indexing service could be of use.

Community
  • 1
  • 1
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Your argumentation, that you have to traverse the whole list, is only true if you have no ordering. I have no clue, but I would bet the implementation of FindFirst uses some kind of indexing, if available. But I think you are right, given that there are only hacky solutions, I will think about an alternative. – Alois Heimer Jan 31 '14 at 22:59
  • @Alois FindFirstFile enumerates in directory order, I believe. NTFS directories are sorted I think. But not all directories are NTFS. – David Heffernan Jan 31 '14 at 23:11
  • I'll give up on this idea. So I wont benchmark the suggestion of @user3256855. Until someone provides data, that this suggestion is actually faster than iteration, I will accept your hints as answer. Thx – Alois Heimer Feb 01 '14 at 00:11
2

How about something like this:

for I := 0 to MAX_DIGITS - 4
begin
    S := 'CONST_' + StringOfChar('0', I);
    for C := '9' downto '1' do
    begin
      if FindFirst(S + C + '*.XYZ', faAnyFile, SearchResult) = 0 then
      begin
          //Code to iterate through the results using FindNext 
          //and returning "biggest" Name
          Result := SearchResult.FileName
          while FindNext(SearchResult) = 0 
            //ommitted: handling dirs / hidden
            if CompareStr(Result, SearchResult.FileName) < 0 then
              Result := SearchResult.FileName;
          //adding recursion instead of while... should make it even faster
          FindClose(SearchResult);
          Break;
      end; 
    end;
end;

Warning: this code has not been tested

  • Your solution does exactly what my function `getRegion()` does. But that's all. (I have to your search term `4*.XYZ` added `CONST_` . In our test environment: (`CONST_0000.XYZ` to `CONST_4999.XYZ`) CONST_4000.XYZ is found and `breaks` then. This is logical, because search term `CONST_4*.XYZ` finds `CONST_4000.XYZ` and breaks. And that is not `CONST_4999.XYZ` ! – moskito-x Jan 31 '14 at 12:22
  • A wrong train of thought is if `MAX_DIGITS` > 4 . Then 1.loop search with term `CONST_08*.XYZ` finds `CONST_08000.XYZ` and breaks . With 2 Loop with term (CONST_8*.XYZ) it finds (CONST_80000.XYZ) and breaks. So first Loop completely useless. – moskito-x Jan 31 '14 at 13:04
  • @moskito-x ehmm. I have done it the wrong way round -> will correct this. – user3256855 Jan 31 '14 at 13:10
  • I know what you trying to achieve when you implement it correctly, it is the quickest way. It follows @TLamas comment : `What about searching for patterns like CONST_9*. If FindFirst finds a file, then you'll continue with CONST_99*` – moskito-x Jan 31 '14 at 13:10
  • @moskito-x Now it should be better. But the question remains, if it is really faster. Depending on the performance of FindFirst vs. FindNext – user3256855 Jan 31 '14 at 13:16
  • @moskito-x I'm starting with _9*..._1*, then 09*...01*, then 009*...001*, then 0009*...0001* and so on. (My assumption is the number is for instance 000353445). Could even be improved by some recursivity... Your solution otoh stops after _9*..._1* – user3256855 Jan 31 '14 at 13:28
  • Why you not start with : `for C := '9' downto '0' do` ?? Without `S` in `FindFirst(C + '*.XYZ',...)` . And as I said it follows @TLama 's comment. – moskito-x Jan 31 '14 at 13:32
  • @moskito-x My solution is starting with `'9' downto '0'` 'cause S is `'CONST_'` in the first Iteration. Thereafter your solution stops but mine continues with '09' downto '01'. And yes, this follows @TLama's comment (if you add some recursion after the first hit) – user3256855 Jan 31 '14 at 13:42
  • I know but for `I := 0 to MAX_DIGITS - 4` is not necessary . `for C := '9' downto '0' do ... if FindFirst('CONST_' + C + '*.XYZ',...)` Is good. – moskito-x Jan 31 '14 at 13:48
  • @moskito-x No. What happens if CONST_0 is reached in your solution (with 000353445 for example)? All remaining items are traversed - what we try to avoid. In my solution it is started again with the next chars. Just try it out. For big MAX_DIGITS and a FileCount 2-3 magnitudes smaller than MAX, I expect a big difference. – user3256855 Jan 31 '14 at 16:00
  • Please : The important part , look at your code there you write a comment `//Code to iterate through the results using FindNext` you don't show us this part . So a simple loop from `CONST_9*.XYZ` to `CONST_0*.XYZ` Is only a good beginning. Show us exactly the code to get `CONST_000353445.XYZ` . – moskito-x Jan 31 '14 at 16:25
  • @moskito-x The code at the comment is not the important part - it is straightforward. I've added it nevertheless. I only wanted to present the idea - cannot write real code, cause I cannot test here. If one implements this as a function this could be recursively called instead of the code at my comment for MAX_DIGITs larger than say 4. – user3256855 Jan 31 '14 at 17:26
  • What good for is `CompareStr(Result, SearchResult.FileName)` and if you think you get automatic a higher value. Then I have to disappoint you. After Deleting and creating new files the FindFirst AND FindNext Order is now : it starts with CONST_0060.XYZ CONST_0061.XYZ CONST_0062.XYZ and ends with CONST_4998.XYZ CONST_4999.XYZ CONST_0000.XYZ CONST_0001.XYZ CONST_0002.XYZ CONST_0003.XYZ ... – moskito-x Jan 31 '14 at 18:14
  • @moskito-x CompareStr finds the file with the highest number, because Findxxx provides unordered data. – user3256855 Jan 31 '14 at 22:36
  • stackoverflow is not the right place to set incomplete, untested code. Your code has some wrong places and therefore does not work . Look at times the description of at `FindNext()` . So what is wrong with : `while FindNext(SearchResult) <> 0` and also the description of `CompareStr()` So what is wrong with : `if CompareStr(Result, SearchResult.FileName) > 0` ?? – moskito-x Feb 01 '14 at 01:04
  • @moskito-x cmon moskito, don't be so stingy, I had no compiler at hand - you spotted the bugs, but hey, the idea was clear. Normally I am proud of even writing one LOC without code completion. btw feel free to improve the code. – user3256855 Feb 01 '14 at 02:51
  • As I have already said the inner part is important, because you can make some mistakes, you only show a comment. `// Code to iterate ....`. Now you have code written so it is better. Also, your code needs more time than others. e.g. CompareStr is slow. You: "but hey, the idea was clear.": This idea is crystal clear since @TLama 's first comment. Only the implementation is not well done. – moskito-x Feb 01 '14 at 12:24
  • @moskito-x Ah - I see – user3256855 Feb 03 '14 at 00:06
-1

An alternative would be

for I := 9999 downto 0 do
  begin
  FileName := Format ('CONST_%.4d.XYZ', [I]);
  if FileExists(FileName) then
    Break;
  end;  

Whether this is faster or not depends on the numbers you are expecting and on the performance of FileExists vs. FindFirst which I cannot comment on.

jpfollenius
  • 16,456
  • 10
  • 90
  • 156
  • FileExists involves a search. That's got to be more expensive than enumeration. – David Heffernan Jan 30 '14 at 14:05
  • @David: Agreed, I think it's definitely slower. I don't think there's a faster way than enumeration then. – jpfollenius Jan 30 '14 at 17:13
  • This cannot possibly be faster than `FindFirst/FindNext`, since it basically does `FindFirst` for every single file individually along with a `FindClose` if the file exists. – Ken White Jan 31 '14 at 01:15
  • @KenWhite: I think so too although your argument is not very convincing in itself. It really depends on the highest number, the density of existing numbers, the performance of `FindNext` and `FindClose` and more. I am not saying it is faster, but I would definitely measure it this would be my question. – jpfollenius Jan 31 '14 at 07:31
-1

The another way is to read all occurring CONST_*.XYZ into FileListBox and then show the last.

procedure TForm1.Button1Click(Sender: TObject);
begin
FileListBox1.Directory:='D:\samples';
FileListBox1.Mask:='CONST_*.XYZ';
FileListBox1.Update;
Label1.Caption:= FileListBox1.Items[FileListBox1.Items.Count-1];
end;

To make it faster, you can use a function

function getRegion(filestr:string):Boolean;
begin
  if FindFirst(filestr, faAnyFile, searchResult) = 0 then result:=true else result:=false;
  if result then begin
     findN:=filestr;
  end;
end;

begin

SetCurrentDir('D:\samples');
  for i:=9 downto 0 do begin
    if getRegion(Format ('CONST_%.1d*.XYZ', [i])) then break;
  end;

FileListBox1.Directory:='D:\samples';
FileListBox1.Mask:=findN;
FileListBox1.Update;
Label1.Caption:= FileListBox1.Items[FileListBox1.Items.Count-1];

Update
For test A) files were created from 0000-4999
For test B) files were created from 0000-9999
TestA made files from 0000 to only 4999 because user jpfollenius uses downto

from 0000 to 4999 = 5000 files
from 9999 downto 4999 = 5000 files

enter image description here

Testtable TestA

enter image description here

Test B

I'm shure with more files 50000 files my solution loads 10000 filenames
for example 50000 to 59999 that takes

  • moskito-x .................................. 0.345 seconds (tested)
  • pure FindFirst / FindNext .......... 0.390 seconds estimated to (0.039 * 10)
moskito-x
  • 11,832
  • 5
  • 47
  • 60
  • This is faster than FindFirst? What if you don't want a GUI? – David Heffernan Jan 30 '14 at 15:15
  • OP : I am looking for an efficient way to get the file with the **highest number** in Delphi Not to iterate from `0001` to `9999` with FindFirst / FindNext. – moskito-x Jan 30 '14 at 15:19
  • Nobody is proposing iterating from `0001` to `9999`. How do you think the file list box control populates itself? What Windows API call lies underneath do you think? – David Heffernan Jan 30 '14 at 15:28
  • 1
    That's neither fast nor convenient – jpfollenius Jan 30 '14 at 17:10
  • @jpfollenius : If my only one proposal is slow (0.078 sec.) ! How would you then call your solution ? incredibly slow (17.25) sec. – moskito-x Jan 31 '14 at 01:09
  • @moskito-x The difference is that I never claimed it to be fast. – jpfollenius Jan 31 '14 at 14:32
  • @jpfollenius : `0.078 sec.` is fast . if you need it only once. for example : during a application start. comparison : `about 100 ms - human blink; human reaction time` . I only want to show another way . You showed us another possibility that requires 17 sec. I have no problem with it. – moskito-x Jan 31 '14 at 14:49
  • @moskito-x How can `TFileListBox` be faster than raw `FindFirstFile`? `TFileListBox` is a wrapper around `FindFirstFile` and therefore must be slower. – David Heffernan Feb 01 '14 at 15:21
  • @DavidHeffernan : I know the `TFileListBox` is slower. But I use this only for a maximum of 9999 files . For a `.mp3` program I need this list. My users can have 500000 or more `.mp3's`. I allow only folder with a maximum of 99999 files . So I need never longer than 0.350 seconds for the list. With FindFirst I restrict the selection down to eg Soul -Classic 9999 files and place them for disposal. – moskito-x Feb 01 '14 at 18:38
  • In my case, and I want it my users not expect by 300000 and more Scrolling mp3 files. To find the highest file I always write the last 3 highest filename into a `.ini` file. When the application starts, It starts with the largest file from the option.ini and search with FindFirst if it is still available , looking with FindNext only a small number of newly added files. If not with the second highest and so on . . Only if neither of these files exists , I 'm going with eg `CONST_9*.XYZ` downto `CONST_0*.XYZ` the long way. – moskito-x Feb 01 '14 at 18:39
  • The advantage of this list : Only one example ( with `FileListBox.items.count` and the highest filename was found, quickly and without further effort , to determine whether gaps in the arrangement of the file names are . etc. – moskito-x Feb 01 '14 at 18:39
  • You simply don't understand me at all. Your control is implemented with FindFirstFile and therefore is slower. – David Heffernan Feb 01 '14 at 18:51
  • @DavidHeffernan : From your Answer : `And I believe that the most efficient way to do that is to use FindFirstFile/FindNextFile` . And now you complain that `Your control is implemented with FindFirstFile` ? – moskito-x Feb 01 '14 at 19:15
  • You claim the a GUI control based on FFF is faster than FFF – David Heffernan Feb 01 '14 at 19:25
  • @DavidHeffernan , No,no,no . I never claimed that . I just wanted to show a different way to find the highest file. In my time comparison it is clearly described. So my way is **0.039** sec **slower** than without GUI. Everyone can see it. – moskito-x Feb 01 '14 at 19:36