1

The following macro does a great job of grouping files by folder, however, it is very slow when it is run on a directory with tens of thousands of files (like 'My Pictures'). Is there any way to speed it up?

Option Explicit
Sub cmdList()
Dim objShell    As Object
Dim objFolder   As Object
Dim sPath       As String
Dim fOut        As Variant
Dim r           As Integer
Dim listRng     As Range
Dim cell        As Range
Dim i           As Integer
Dim j           As Integer

Set objShell = CreateObject("Shell.Application")
Set objFolder = objShell.BrowseForFolder(0, "Select Folder", 0, 17)
If objFolder Is Nothing Then Exit Sub
Application.ScreenUpdating = False
sPath = objFolder.self.Path
Set objFolder = Nothing: Set objShell = Nothing

r = 6: Range(r & ":" & Rows.Count).Delete
Cells(r - 1, 1) = sPath

fOut = Split(CreateObject("WScript.Shell").exec("cmd /c dir """ & sPath & """ /a:-h-s /b /s").StdOut.ReadAll, vbNewLine)

Cells(r, 1).Resize(UBound(fOut), 1) = WorksheetFunction.Transpose(fOut)

Set listRng = Cells(r, 1).CurrentRegion
listRng.Sort Key1:=Cells(r, 1), Order1:=xlAscending, Header:=xlYes

For i = 1 To listRng.Count
    For j = i + 1 To listRng.Count
        If InStr(listRng.Cells(j), listRng.Cells(i)) Then
            With listRng.Cells(j)
                .Rows.Group
                .IndentLevel = .Rows.OutlineLevel - 1
            End With
        Else
            Exit For
        End If
    Next j
Next i
Application.ScreenUpdating = True
End Sub

The output I am looking to achieve is this:

Level 1....

Level 1

Level 2...

Level 2

Level 3...

Level 3

BadDogTitan
  • 103
  • 1
  • 9
  • At which stage is the slowdown occurring? – Ron Rosenfeld May 31 '19 at 12:32
  • It looks like you are having a nested loop, over the same range. Maybe you can rewrite the conditions to not have to do that twice. And/Or allocate the range to an array, and only touch the rows where needed, should probably help a bit. – FAB May 31 '19 at 12:37
  • The slow down is definitely in the nested loop, Ron. I am not sure how to implement DarXyde's suggestion, though. – BadDogTitan May 31 '19 at 12:49
  • Looping through 10K rows twice is 100M of iterations. What is the exact idea of the loop? How many are the grouped rows? If they are 2 and they are next to each other, this could be achieved with 1 loop only, thus a linear complexity is quite ok. – Vityata May 31 '19 at 13:00
  • That's why I am here, Vityata. No idea how to implement....is there a 'beginner;s VBA' thread I should be using??? – BadDogTitan May 31 '19 at 13:07
  • How about this one https://stackoverflow.com/questions/31414106/get-list-of-excel-files-in-a-folder-using-vba – Ole EH Dufour May 31 '19 at 14:47

1 Answers1

2

EDIT: answer is updated to change the grouping to above and to correct a condition that incorrectly indented and grouped the rows.

So it was an interesting problem to solve. In addition to the actual solution, I have some other tips that I normally include in my code that I'll call out as well. My solution is VERY fast as well. When I parsed the C:\Program Files\ directory tree (18,017 files), it ran in under 5 seconds.

  1. Declare your variables as close as possible to the point where they are used for the first time. This makes it much easier to determine the variable type and definition, and also helps to functionally group the code.
  2. Those logical groups can then be functionally isolated into separate functions and subs. This will make the main logic of your code much easier to grasp in a single quick view, rather than requiring the reader (probably YOU in a few months) to re-read large logic sections and digest it in order to understand it.

In my example code, I start off with three quick functions that tell you exactly what's going on:

Dim rootFolder As String
rootFolder = SelectFolder

Dim pathArray As Variant
pathArray = GetAllFiles(rootFolder)

Dim folderGroups As Object
Set folderGroups = BuildFolderDictionary(pathArray)

The first function is straightforward and closely follows your approach to selecting the root folder:

Private Function SelectFolder() As String
    '--- returns the user-selected folder as a string
    Dim objShell As Object
    Dim objFolder As Object
    Set objShell = CreateObject("Shell.Application")
    Set objFolder = objShell.BrowseForFolder(0, "Select Folder", 0, 17)
    If Not objFolder Is Nothing Then
        SelectFolder = objFolder.self.path
    End If
End Function

The next function (GetAllFiles) also uses your approach but instead of putting the results directly into the worksheet, it keeps the results in a memory-based array (at the bottom of this answer, I include the whole module in a single code block for copy/paste later):

Private Function GetAllFiles(ByVal rootPath As String, _
                             Optional onlyFolders As Boolean = False) As Variant
    '--- returns a sorted array of all filepaths in the given directory path
    Dim dirOptions As String
    If onlyFolders Then
        dirOptions = """ /a:d-h-s /b /s"
    Else
        dirOptions = """ /a:-h-s /b /s"
    End If
    Dim fOut() As String
    fOut = Split(CreateObject("WScript.Shell").exec("cmd /c dir """ & _
                                                    rootPath & _
                                                    dirOptions).StdOut.ReadAll, _
                 vbNewLine)
    QuickSort fOut, LBound(fOut), UBound(fOut)

    '--- the pathArray skips the first position from the fOut array
    '    because it's always blank, but add the root folder as the first entry
    Dim pathArray As Variant
    ReDim pathArray(1 To UBound(fOut) + 1, 1 To 1)
    pathArray(1, 1) = rootPath
    Dim i As Long
    For i = 2 To UBound(fOut) + 1
        pathArray(i, 1) = fOut(i - 1)
    Next i
    GetAllFiles = pathArray
End Function

Memory-based arrays are MUCH, MUCH faster than working directly off the worksheet with Cells or Ranges.

The next function (BuildFolderDictionary) works using the array of paths and works to build a list (a Dictionary) of unique folders within the folder hierarchy. Along the way, it also creates a "span" of the rows that the subfolder encompasses. This will be very useful later. Remember, we're doing all this in memory so it's quick.

Private Function BuildFolderDictionary(ByRef paths As Variant) As Object
    Dim folders As Object
    Set folders = CreateObject("Scripting.Dictionary")

    '--- scan all paths and create a dictionary of each folder and subfolder
    '    noting which items (rows) map into each dictionary
    Dim i As Long
    For i = LBound(paths) To UBound(paths)
        Dim folder As String
        Dim pos1 As Long
        If Not IsEmpty(paths(i, 1)) Then
            pos1 = InStrRev(paths(i, 1), "\")   'find the last folder separator
            folder = Left$(paths(i, 1), pos1)
            If Not folders.Exists(folder) Then
                '--- new (sub)folder, create a new entry
                folders.Add folder, CStr(i) & ":" & CStr(i)
            Else
                '--- extisting (sub)folder, add to the row range
                Dim rows As String
                rows = folders(folder)
                rows = Left$(rows, InStr(1, rows, ":"))
                rows = rows & CStr(i)
                folders(folder) = rows
            End If
        End If
    Next i

    '--- final fixup: the root folder group should always encompass all
    '    the entries (runs from the second row to the end)...
    '    and we'll also determine the indent level using the first entry
    '    as the baseline (level 1).  stored as "rows,level" e.g. "2:7,1"
    Dim rootSlashes As Long
    rootSlashes = Len(root) - Len(Replace(root, "\", "")) - 1
    folders(root) = "2:" & UBound(paths) & ",1"

    Dim slashes As Long
    folder = folders.Keys
    For i = 1 To UBound(folder)
        slashes = Len(folder(i)) - Len(Replace(folder(i), "\", ""))
        folders(folder(i)) = folders(folder(i)) & "," & _
                                     CStr(slashes - rootSlashes)
    Next i

    For Each folder In folders
        Debug.Print folder & " - " & folders(folder)
    Next folder

    Set BuildFolderDictionary = folders
End Function

The final two parts are to copy the memory array (of file paths) to the worksheet...

    Const START_ROW As Long = 6
    Dim pathRange As Range
    Set pathRange = Sheet1.Range("A" & START_ROW).Resize(UBound(pathArray) + 1, 1)
    pathRange = pathArray

and then apply both the indentation and grouping of the rows. We're using the dictionary of folder groups we created that has all of the subfolder rows nicely defined for us already...

    Const MAX_GROUP_LEVEL As Long = 8
    Dim rowGroup As Variant
    Dim level As Long
    Dim folderData As Variant
    Dim theseRows As String
    For Each rowGroup In folderGroups
        folderData = Split(folderGroups(rowGroup), ",")
        theseRows = folderData(0)
        level = folderData(1)
        With pathRange.rows(theseRows)
            .IndentLevel = level
            If level < MAX_GROUP_LEVEL Then
                .Group
            End If
        End With
    Next rowGroup

(I ran into an issue during testing when the program errored with a group level deeper than 8. So I put a limit in the logic to prevent the error.)

So now, the whole module in a single block:

Option Explicit

Public Sub ShowFilePaths()
    Dim rootFolder As String
    rootFolder = SelectFolder
    If rootFolder = vbNullString Then Exit Sub

    '--- quick fixup if needed
    rootFolder = rootFolder & IIf(Right$(rootFolder, 1) = "\", vbNullString, "\")

    Dim pathArray As Variant
    pathArray = GetAllFiles(rootFolder)

    Dim folderGroups As Object
    Set folderGroups = BuildFolderDictionary(rootFolder, pathArray)

    '--- when debugging, this block just clears the worksheet to make it
    '    easier to rerun and test the code
    On Error Resume Next
    With Sheet1
        .UsedRange.ClearOutline
        .UsedRange.Clear
        .Outline.SummaryRow = xlAbove
    End With
    Err.Clear
    On Error GoTo 0

    '--- copy the array to the worksheet
    Const START_ROW As Long = 6
    Dim pathRange As Range
    Set pathRange = Sheet1.Range("A" & START_ROW).Resize(UBound(pathArray), 1)
    pathRange = pathArray

    '------ now apply the indention levels to each line on the sheet
    '       and group the same rows
    Const MAX_GROUP_LEVEL As Long = 8
    Dim rowGroup As Variant
    Dim level As Long
    Dim folderData As Variant
    Dim theseRows As String
    For Each rowGroup In folderGroups
        folderData = Split(folderGroups(rowGroup), ",")
        theseRows = folderData(0)
        level = folderData(1)
        With pathRange.rows(theseRows)
            .IndentLevel = level
            If level < MAX_GROUP_LEVEL Then
                .Group
            End If
        End With
    Next rowGroup
End Sub

Private Function SelectFolder() As String
    '--- returns the user-selected folder as a string
    Dim objShell As Object
    Dim objFolder As Object
    Set objShell = CreateObject("Shell.Application")
    Set objFolder = objShell.BrowseForFolder(0, "Select Folder", 0, 17)
    If Not objFolder Is Nothing Then
        SelectFolder = objFolder.self.Path
    End If
End Function

Private Function GetAllFiles(ByVal rootPath As String, _
                             Optional onlyFolders As Boolean = False) As Variant
    '--- returns a sorted array of all filepaths in the given directory path
    Dim dirOptions As String
    If onlyFolders Then
        dirOptions = """ /a:d-h-s /b /s"
    Else
        dirOptions = """ /a:-h-s /b /s"
    End If
    Dim fOut() As String
    fOut = Split(CreateObject("WScript.Shell").exec("cmd /c dir """ & _
                                                    rootPath & _
                                                    dirOptions).StdOut.ReadAll, _
                 vbNewLine)
    QuickSort fOut, LBound(fOut), UBound(fOut)

    '--- the pathArray skips the first position from the fOut array
    '    because it's always blank, but add the root folder as the first entry
    Dim pathArray As Variant
    ReDim pathArray(1 To UBound(fOut) + 1, 1 To 1)
    pathArray(1, 1) = rootPath
    Dim i As Long
    For i = 2 To UBound(fOut) + 1
        pathArray(i, 1) = fOut(i - 1)
    Next i
    GetAllFiles = pathArray
End Function

Private Function BuildFolderDictionary(ByVal root As String, _
                                       ByRef paths As Variant) As Object
    Dim folders As Object
    Set folders = CreateObject("Scripting.Dictionary")

    '--- scan all paths and create a dictionary of each folder and subfolder
    '    noting which items (rows) map into each dictionary
    Dim folder As Variant
    Dim i As Long
    For i = LBound(paths) To UBound(paths)
        Dim pos1 As Long
        If Not IsEmpty(paths(i, 1)) Then
            pos1 = InStrRev(paths(i, 1), "\")   'find the last folder separator
            folder = Left$(paths(i, 1), pos1)
            If Not folders.Exists(folder) Then
                '--- new (sub)folder, create a new entry
                folders.Add folder, CStr(i) & ":" & CStr(i)
            Else
                '--- extisting (sub)folder, add to the row range
                Dim rows As String
                rows = folders(folder)
                rows = Left$(rows, InStr(1, rows, ":"))
                rows = rows & CStr(i)
                folders(folder) = rows
            End If
        End If
    Next i

    '--- final fixup: the root folder group should always encompass all
    '    the entries (runs from the second row to the end)...
    '    and we'll also determine the indent level using the first entry
    '    as the baseline (level 1).  stored as "rows,level" e.g. "2:7,1"
    Dim rootSlashes As Long
    rootSlashes = Len(root) - Len(Replace(root, "\", "")) - 1
    folders(root) = "2:" & UBound(paths) & ",1"

    Dim slashes As Long
    folder = folders.Keys
    For i = 1 To UBound(folder)
        slashes = Len(folder(i)) - Len(Replace(folder(i), "\", ""))
        folders(folder(i)) = folders(folder(i)) & "," & _
                                     CStr(slashes - rootSlashes)
    Next i

    For Each folder In folders
        Debug.Print folder & " - " & folders(folder)
    Next folder

    Set BuildFolderDictionary = folders
End Function

Private Sub QuickSort(ByRef Field() As String, ByVal LB As Long, ByVal UB As Long)
    '--- from https://stackoverflow.com/a/152333/4717755
    Dim P1 As Long, P2 As Long, Ref As String, TEMP As String

    P1 = LB
    P2 = UB
    Ref = Field((P1 + P2) / 2)

    Do
        Do While (Field(P1) < Ref)
            P1 = P1 + 1
        Loop

        Do While (Field(P2) > Ref)
            P2 = P2 - 1
        Loop

        If P1 <= P2 Then
            TEMP = Field(P1)
            Field(P1) = Field(P2)
            Field(P2) = TEMP

            P1 = P1 + 1
            P2 = P2 - 1
        End If
    Loop Until (P1 > P2)

    If LB < P2 Then Call QuickSort(Field, LB, P2)
    If P1 < UB Then Call QuickSort(Field, P1, UB)
End Sub
PeterT
  • 8,232
  • 1
  • 17
  • 38
  • PeterT, at "Private Function BuildFolderDictionary(ByRef paths As Variant) As Dictionary" I get a 'Compile error: User-defined type not defined'? I think I need to implement late-binding, because I don't want to have the end-user to have to enable the 'Microsoft Scripting Runtime' library. – BadDogTitan Jun 04 '19 at 00:49
  • That's fine. I've updated the answer to show the `Dictionary` implemented with late-binding. – PeterT Jun 04 '19 at 13:12
  • You were right about the speed, PeterT! Fantastically fast. However, the last group in the outputted list is not including the folders/files contained within the last folder. The indent level looks good, but the contents of the last folder are grouped in with the first level. – BadDogTitan Jun 06 '19 at 14:52
  • I'm not sure if that's an artifact of your folder structure or in the code. I don't see that issue when I run it against a few different folder hierarchies on my drive. You can try commenting out the `QuickSort` call and see if that makes any difference, but beyond that I would need more information to help resolve the issue if it's in the code. – PeterT Jun 06 '19 at 17:33
  • Actually, you must leave the `QuickSort` in the logic in order for the grouping to work out correctly. So I'm still not sure if there's an issue with the code. – PeterT Jun 06 '19 at 17:52
  • Curious....I just created a new folder with three subfolders containing one text document each. I ran the script on a new sheet and noted the same failed grouping of the last document into the last folder. When I debug.print the variable 'theseRows', I show 1:5, 2:2, 4:4, 6:6. I would expect to see (I think) 1:6, 2:2, 4:4, 6:6. The code appears to be missing grouping the last file(s) into the first group? – BadDogTitan Jun 06 '19 at 18:53
  • I finally see what you're saying. The `GetAllFiles` function was not including the root folder itself. I've updated the code above to include it in the array and I think it's showing what you want now. Give it a try. – PeterT Jun 06 '19 at 20:29
  • A little closer, but that last document is still hanging out there on its own without a folder for a home. It looks like the final result I am going for is to group the above folder set as: 1:7, 2:2, 4:4, 6:6. Alas, your code is beyond my skills to figure out how to get there.... – BadDogTitan Jun 06 '19 at 20:49
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194560/discussion-between-petert-and-baddogtitan). – PeterT Jun 06 '19 at 20:55
  • After using this for some time, I have found the grouping is not behaving as expected. The last group of files is not being grouped at its subfolder level on the sheet. This needs to be corrected in the 'BuildFolderDictionary' function, but it escapes me. – BadDogTitan Jun 21 '19 at 12:10
  • Try commenting out the call to `QuickSort` in the `GetAllFiles` routine and see if that improves your results. – PeterT Jun 21 '19 at 13:50
  • 'QuickSort' seems to be good. The last group of files is not grouping, even though the 'For...Next' loop is calling for it. I can manually group the last group of files, but that is tedious on 10,000 files! – BadDogTitan Jun 21 '19 at 14:58
  • I suggest setting the code to stop inside the `For...Next` loop when you think it should be grouping those files. Use the `Debug.Assert` statement and figure out a condition to test that makes sense for your situation (more details [here](http://www.cpearson.com/excel/DebuggingVBA.aspx)). Walk through the logic until you can understand what the loop logic is looking for and what is different about your situation. Then you can figure out a fix. – PeterT Jun 21 '19 at 16:16
  • Thanks for the guidance, PeterT. I will step through it. – BadDogTitan Jun 21 '19 at 17:58
  • It's been a while, PeterT, but work has kept me away. The fix for the grouping issue was remarkably simple. Instead of using .group, I set .OutlineLevel property to the variable "level". Works swimmingly now. I have not been able to implement adding elements to the dictionary to differentiate folders from files, or to specify a text colour as we chatted about. – BadDogTitan Nov 10 '19 at 16:35
  • this is a brilliant piece of code, thanks! Does anyone know what parts I need to change in "Private Function SelectFolder" to be able to browse and select a folder instead of using the current popup listing all the folders as a tree? – new11 Sep 10 '22 at 05:03
  • Take a look at using the [`Application.FileDialog`](https://analystcave.com/vba-application-filedialog-select-file/) and/or the [`FileSystemObject`](https://www.wallstreetmojo.com/vba-filesystemobject/) – PeterT Sep 11 '22 at 18:23