2

I'm looking for the fastest-performing method to recursively search subdirectories for a filepattern using an Excel macro. Excel VBA seems to be rather slow at this.

Things I've tried so far (some based on other stackoverflow suggestions):

  • Exclusive use of Dir to recurse through subdirectories and search for the filepattern in each folder. (slowest)
  • Iterating through FileSystemObject Folders using Folder.Files collection, checking each file against filepattern. (better, but still slow)
  • Iterating through FileSystemObject Folders, and then using Dir to check each folder for the filepattern (fastest so far, but this is still taking several seconds per file and I'd like to optimize if possible)

I looked in to My.Computer.FileSystem.GetFiles, which seems like it would be perfect (allows you to specify a wildcard pattern and search subfolders with a single command) - but it doesn't appear to be supported in Excel VBA from what I can tell, only in VB.

I'm currently using the FindFile Sub below, which has the best performance so far. If anyone has suggestions for how to further improve this, I would be very grateful!

Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long


Function Recurse(sPath As String, targetName As String) As String

    Dim FSO As New FileSystemObject
    Dim myFolder As Folder
    Dim mySubFolder As Folder
    Dim myFile As File

    On Error Resume Next
    Set myFolder = FSO.GetFolder(sPath)
    If Err.Number <> 0 Then
        MsgBox "Error accessing " & sPath & ". The macro will abort."
        Err.Clear
        Exit Function
    End If
    On Error GoTo 0

    Dim foundFolderPath As String
    Dim foundFileName As String
    foundFolderPath = ""
    foundFileName = ""

    For Each mySubFolder In myFolder.SubFolders

        foundFileName = Dir(mySubFolder.Path & "\" & targetName & "*")
        If foundFileName <> vbNullString Then
            foundFolderPath = mySubFolder.Path & "\" & foundFileName
        End If

        If foundFolderPath <> vbNullString Then
            Recurse = foundFolderPath
            Exit Function
        End If

        foundFolderPath = Recurse(mySubFolder.Path, targetName)

        If foundFolderPath <> vbNullString Then
            Recurse = foundFolderPath
            Exit Function
        End If
    Next

End Function


Sub FindFile()

    Dim start As Long
    start = GetTickCount()

    Dim targetName As String
    Dim targetPath As String
    targetName = Range("A1").Value 'Target file name without extension
    targetPath = "C:\Example\" & Range("B1").Value 'Subfolder name

    Dim target As String
    target = Recurse(targetPath, targetName)

    Dim finish As Long
    finish = GetTickCount()

    MsgBox "found: " & target & vbNewLine & vbNewLine & (finish - start) & " milliseconds"

End Sub

Updated File Search Function Based on Accepted Answer

This version of FindFile() performs about twice as fast as the method I originally pasted in the question above. As discussed in the posts below, this should work for 32 or 64-bit versions of Excel 2010 and newer.

Option Explicit

Private Declare PtrSafe Function FindClose Lib "kernel32" (ByVal hFindFile As LongPtr) As Long
Private Declare PtrSafe Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As LongPtr, ByVal lpFindFileData As LongPtr) As LongPtr
Private Declare PtrSafe Function FindNextFileW Lib "kernel32" (ByVal hFindFile As LongPtr, ByVal lpFindFileData As LongPtr) As LongPtr

Private Type FILETIME
  dwLowDateTime  As Long
  dwHighDateTime As Long
End Type

Const MAX_PATH  As Long = 260
Const ALTERNATE As Long = 14

' Can be used with either W or A functions
' Pass VarPtr(wfd) to W or simply wfd to A
Private Type WIN32_FIND_DATA
  dwFileAttributes As Long
  ftCreationTime   As FILETIME
  ftLastAccessTime As FILETIME
  ftLastWriteTime  As FILETIME
  nFileSizeHigh    As Long
  nFileSizeLow     As Long
  dwReserved0      As Long
  dwReserved1      As Long
  cFileName        As String * MAX_PATH
  cAlternate       As String * ALTERNATE
End Type

Private Const FILE_ATTRIBUTE_DIRECTORY As Long = 16 '0x10
Private Const INVALID_HANDLE_VALUE As LongPtr = -1

Function Recurse(folderPath As String, fileName As String)
    Dim fileHandle    As LongPtr
    Dim searchPattern As String
    Dim foundPath     As String
    Dim foundItem     As String
    Dim fileData      As WIN32_FIND_DATA

    searchPattern = folderPath & "\*"

    foundPath = vbNullString
    fileHandle = FindFirstFileW(StrPtr(searchPattern), VarPtr(fileData))
    If fileHandle <> INVALID_HANDLE_VALUE Then
        Do
            foundItem = Left$(fileData.cFileName, InStr(fileData.cFileName, vbNullChar) - 1)

            If foundItem = "." Or foundItem = ".." Then 'Skip metadirectories
            'Found Directory
            ElseIf fileData.dwFileAttributes And FILE_ATTRIBUTE_DIRECTORY Then
                foundPath = Recurse(folderPath & "\" & foundItem, fileName)
            'Found File
            'ElseIf StrComp(foundItem, fileName, vbTextCompare) = 0 Then 'these seem about equal
            ElseIf InStr(1, foundItem, fileName, vbTextCompare) > 0 Then 'for performance
                foundPath = folderPath & "\" & foundItem
            End If

            If foundPath <> vbNullString Then
                Recurse = foundPath
                Exit Function
            End If

        Loop While FindNextFileW(fileHandle, VarPtr(fileData))
    End If

    'No Match Found
    Recurse = vbNullString
End Function

Sub FindFile()

    Dim targetName As String
    Dim targetPath As String
    targetName = Range("A4").Value
    targetPath = "C:\Example\" & Range("B4").Value

    Dim target As String
    target = Recurse(targetPath, targetName)

    MsgBox "found: " & target

End Sub
Community
  • 1
  • 1
jramm
  • 751
  • 1
  • 8
  • 26
  • You're already using functions from `kernel32` to time your routine, I'd suggest that somewhere in there are faster ways to search the drive. I'm not certain what the function calls are that you'd need, but I'd think a look through the Win API would be in order. – FreeMan May 28 '15 at 15:54
  • If speed is critical on this part, can you move it outside Excel? Not sure about speed, but here is a [related post on that front](http://stackoverflow.com/questions/2784367/capture-output-value-from-a-shell-command-in-vba). – Byron Wall May 28 '15 at 16:03
  • +1 for the question. Not sure if this is fast, but have you looked at [FindFirstFile](https://msdn.microsoft.com/en-us/library/windows/desktop/aa364418(v=vs.85).aspx) and [FindNextFile](https://msdn.microsoft.com/en-us/library/windows/desktop/aa364428(v=vs.85).aspx)? There is an example [here](http://www.freevbcode.com/ShowCode.asp?ID=1331). API calls should have a performance hit, but overall they might be faster. – Ioannis May 28 '15 at 16:20
  • For future reference, if you're posting a question about the 64 bit version of Excel you should specify that. Microsoft recommends the 32 bit version for compatibility reasons. If you're not using the recommended compatible version, you want to tell people so they can help you properly. – AndASM May 29 '15 at 22:20
  • I found using the windows shell to be MUCH faster than native VBA or the FileSystemObject methods. – Ron Rosenfeld May 29 '15 at 22:38

3 Answers3

0

Use FindFirstFile or FindFirstFileEx. The built in native APIs will perform much faster than VBA.

The answer is here on stackoverflow: https://stackoverflow.com/a/3865850/2250183
As stated in that answer, you can find example code here: http://www.xtremevbtalk.com/showpost.php?p=1157418&postcount=4

Updated Example with 64 bit Support

This code should work for Excel 2010 and later on 64 bit and 32 bit. It won't work on earlier releases of Excel. I suggest you read the documentation on 64 bit support in VBA if you plan on using the 64 bit version. The documentation also explains how to add support for earlier versions of Excel.

Option Explicit

Private Declare PtrSafe Function FindClose Lib "kernel32" (ByVal hFindFile As LongPtr) As Long
Private Declare PtrSafe Function FindFirstFileW Lib "kernel32" (ByVal lpFileName As LongPtr, ByVal lpFindFileData As LongPtr) As LongPtr
Private Declare PtrSafe Function FindNextFileW Lib "kernel32" (ByVal hFindFile As LongPtr, ByVal lpFindFileData As LongPtr) As LongPtr

Private Type FILETIME
  dwLowDateTime  As Long
  dwHighDateTime As Long
End Type

Const MAX_PATH  As Long = 260
Const ALTERNATE As Long = 14

' Can be used with either W or A functions
' Pass VarPtr(wfd) to W or simply wfd to A
Private Type WIN32_FIND_DATA
  dwFileAttributes As Long
  ftCreationTime   As FILETIME
  ftLastAccessTime As FILETIME
  ftLastWriteTime  As FILETIME
  nFileSizeHigh    As Long
  nFileSizeLow     As Long
  dwReserved0      As Long
  dwReserved1      As Long
  cFileName        As String * MAX_PATH
  cAlternate       As String * ALTERNATE
End Type

Private Const INVALID_HANDLE_VALUE As LongPtr = -1

Private Sub Form_Load()
  Dim hFile     As LongPtr
  Dim sFileName As String
  Dim wfd       As WIN32_FIND_DATA

  sFileName = "c:\*.*" ' Can be up to 32,767 chars

  hFile = FindFirstFileW(StrPtr(sFileName), VarPtr(wfd))

  If hFile <> INVALID_HANDLE_VALUE Then
    Do While FindNextFileW(hFile, VarPtr(wfd))
      Debug.Print Left$(wfd.cFileName, InStr(wfd.cFileName, vbNullChar) - 1)
    Loop

    FindClose hFile
  End If
End Sub
Community
  • 1
  • 1
AndASM
  • 9,458
  • 1
  • 21
  • 33
  • I'm running into problems trying to use this solution, because the FindFirstFile functions appear to require 32 bit pointers, but I'm on a 64-bit system. In particular, the use of StrPtr in the linked example code throws errors. Excel also requires me to declare FindFirstFile functions as PtrSafe (not sure if that will cause issues). Microsoft appears to have released a "hotfix" to address this issue (https://support.microsoft.com/en-us/kb/983246) but that's a dead end for me, because I need to be able to distribute this macro at work with no strings attached. – jramm May 28 '15 at 23:36
  • You don't need the hotfix, that's not what it does. You should read the documentation on 64 bit VBA. I've updated the example for you, but again, you should read the documentation if you're going to use the 64 bit version. You'll run into compatibility problems frequently, they are easy to fix, but you need to learn what they are and why they happen. – AndASM May 29 '15 at 22:16
0

I had a similar performance issue with I resolved with win API functions as suggested above, my problem was slightly different to yours as I didn't need to recursively search a directory tree, I just pulled filenames from a given folder into a collection, but you can probably adapt my code:

'for windows API call to FindFirstFileEx
Private Const INVALID_HANDLE_VALUE = -1
Private Const MAX_PATH = 260

Private Type FILETIME
    dwLowDateTime   As Long
    dwHighDateTime  As Long
End Type

Private Type WIN32_FIND_DATA
    dwFileAttributes    As Long
    ftCreationTime      As FILETIME
    ftLastAccessTime    As FILETIME
    ftLastWriteTime     As FILETIME
    nFileSizeHigh       As Long
    nFileSizeLow        As Long
    dwReserved0         As Long
    dwReserved1         As Long
    cFileName           As String * MAX_PATH
    cAlternate          As String * 14
End Type

Private Const FIND_FIRST_EX_CASE_SENSITIVE  As Long = 1
'MSDN: "This value is not supported until Windows Server 2008 R2 and Windows 7."
Private Const FIND_FIRST_EX_LARGE_FETCH     As Long = 2

Private Enum FINDEX_SEARCH_OPS
    FindExSearchNameMatch
    FindExSearchLimitToDirectories
    FindExSearchLimitToDevices
End Enum

Private Enum FINDEX_INFO_LEVELS
    FindExInfoStandard
    FindExInfoBasic 'MSDN: "This value is not supported until Windows Server 2008 R2 and Windows 7."
    FindExInfoMaxInfoLevel
End Enum

Private Declare Function FindFirstFileEx Lib "kernel32" Alias "FindFirstFileExA" ( _
ByVal lpFileName As String, ByVal fInfoLevelId As Long, lpFindFileData As WIN32_FIND_DATA, _
    ByVal fSearchOp As Long, ByVal lpSearchFilter As Long, ByVal dwAdditionalFlags As Long) As Long
Private Declare Function FindNextFile Lib "kernel32" Alias "FindNextFileA" ( _
    ByVal hFindFile As Long, lpFindFileData As WIN32_FIND_DATA) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long


Private Function GetFileNames(ByVal sPath As String) As Collection

    Dim fileInfo    As WIN32_FIND_DATA  'buffer for file info
    Dim hFile       As Long             'file handle
    Dim colFiles    As New Collection

    sPath = sPath & "*.*"

    hFile = FindFirstFileEx(sPath & vbNullChar, FindExInfoBasic, fileInfo, FindExSearchNameMatch, 0&, FIND_FIRST_EX_LARGE_FETCH)

    If hFile <> INVALID_HANDLE_VALUE Then
        Do While FindNextFile(hFile, fileInfo)
            colFiles.Add Left(fileInfo.cFileName, InStr(fileInfo.cFileName, vbNullChar) - 1)
        Loop

        FindClose hFile
    End If

    Set GetFileNames = colFiles

End Function
Coder375
  • 1,535
  • 12
  • 14
0

I spent a few days on this problem and came up with this code. Got the first part off a reddit post (thanked) and modified it a bit. I plugged a few different directory situations and the searches came in at similar times.

Found in 223 folders. FindFast: 23.42 Secs, Recurse: 24.14 Secs. If I pick a file in the last folder to check in FindFast, we get Found in 387 folders. FindFast: 62.82 Secs, Recurse: 0.3 Secs. So the order the folders are checked are not the same.

There are some differences to note. The original purpose of my code was to get all xl files based on a wildcard such as "*_ThisName.xlsx". It ends up giving me all 41 in 9 seconds. The 10 seconds I am able to shave off for my multiple file search is because I can specify that the files I am looking for are in a sub-directory that is named "Working", and also that I limit the directory count to 10 deep. I commented out those restrictions for this test and it added 10 seconds to finding just the one file.

I am still hoping we can get the search time down further.

    Function FindFast(TargetFolder As String, Patt As String)
       
        Dim Folder As Object, SubFolder As Object, File As Object
        Dim FQueue As New Collection

       'Test view all folders:
'       Dim FolderColl As New Collection
        
       Dim Count As Integer

        Dim fl As String
           
        With CreateObject("Scripting.FileSystemObject")

            FQueue.Add .GetFolder(TargetFolder)
            Do While FQueue.Count > 0
               Set Folder = FQueue(1)
               FQueue.Remove 1
                'Code for individual folder
                For Each SubFolder In Folder.subFolders

                    'Test view all folders:
                    FolderColl.Add SubFolder

                    'Only 10 folders deep
     '               Count = Len(SubFolder) - Len(Replace(SubFolder, "\", ""))
     '               If Count < 13 Then
                        FQueue.Add SubFolder
                   
             '           ' Only look for the file in Working folder
               '        If InStr(1, SubFolder, "Working") > 1 Then
    
                            fl = Dir(SubFolder & "\" & Patt)
                ' Added as exact match return.  Otherwise will find all with pattern match
                If fl <> "" Then
                    FindFast = SubFolder & "\" & fl
                    Exit Function
                End If
                               
                 '       End If
        '             End If
                Next SubFolder
            Loop

'   Test view all folders:
' Dim i As Long
'For i = 1 To FolderColl.Count
'                                Range("A" & i).value = FolderColl(i)
'Next i           

        End With
         FindFast = vbNullString
    End Function
    
    
    Sub FindFile()
        Dim StartTime As Double
        Dim SecondsElapsed As Double
        Dim target As String
    
        Dim targetName As String
        Dim targetPath As String
        targetName = "5-3-21_Order_Sent.xlsx"
       '    Patt = "*_Order_Sent.xlsx"
        ' or wild extension   Patt = "*_ThisName.*"
    
        targetPath = "\\Fulfill\Company\Orders\Completed"

        StartTime = Timer    
        target = FindFast(targetPath, targetName)
        Debug.Print target
        SecondsElapsed = Round(Timer - StartTime, 2)
        Debug.Print "FindFast: " & SecondsElapsed & " Secs"
    
        MsgBox "found FindFast: " & target & " - " & SecondsElapsed & " Secs"       

        StartTime = Timer   
        target = Recurse(targetPath, targetName)
        Debug.Print target
    
         SecondsElapsed = Round(Timer - StartTime, 2)
        Debug.Print "Recurse: " & SecondsElapsed & " Secs"
    
         MsgBox "found Recurse: " & target & " - " & SecondsElapsed & " Secs"
    
    End Sub

So in this form above, it is loading all the folders, which sifts them by subfolder and that subfolders folders oldest to newest. In one form I am looking to find a record in a file in a newest folder, with decreasing likelihood the older the directory. In the other usage I am looking to get a list of all the files, and looking for a record in probably a newest filer, but can be older, with a decreasing likelihood the older the file.

Keith Swerling
  • 136
  • 1
  • 6