0

I am currently working on a project that requires me to compile data from hundreds of spreadsheets in a given directory. My problem is I'm not sure how to handle different sub-revisions of files. For example the files are named:

  • File Name R1a.xlsx
  • File Name R1b.xlsx
  • File Name R1c.xlsx
  • File Name R2a.xlsx
  • File Name R2b.xlsx

For the above files I would only need to read from 1c and 2b. Is there a good way of determining which files need read, or could someone at least point me in a direction that I could look into? My initial thoughts were to loop through the characters in the file names and check for the largest letter that follows a number, but that seems like it would be incredibly tricky to code properly.

Thanks in advance!

Aaron
  • 3
  • 2
  • Rather than checking for the "largest" letter, as long as there is only one letter, you can just check for the highest character value by using `Asc`, so "a" has a value of 97, and "z" a value of 122. – Applecore May 15 '20 at 19:43

4 Answers4

0

You can store your list of partial file names you need to match in an array. Then loop through the partial names in the array and foreach partial name, loop through the directory to look for a match. VBA provides the InStr function to that you can use to test if a file name contains the partial name from your array.

In pseudocode:

myArray = [ 1c, 2b]

ForEach partialName in myArray 
  ForEach file in myDirectory
    If InStr(fileName, partialName) Then
      // Do something interesting
    End If
  Next file
Next partialName
jpf
  • 165
  • 10
0

There are a number of ways to approach this problem. If your filename domain is truly fixed as R{num}{prio}.xlsx, then note that the filenames constitute valid Excel cell addresses when {prio}.xlsx is stripped away. The resultant cell addresses from your example are R1 and R2. You can now use the R column of the current spreadsheet as a sparse vector to store the highest priority observed per cell (aka file). When all filenames have been examined and their highest priorities stored, it's now a simple matter of traversing the R column to pick up the files selected for processing.

Here's code that handles the aforementioned filename domain....

Sub ProcessFilesBasedOnFnamePriority()
    Dim filenames, fname As Variant
    Dim maxRowNum, nRowNum, i As Long
    Dim strFilePrefix, strCellPrio As String

    maxRowNum = 1
    filenames = listfiles("c:\temp\lots_of_files")

    ' make an assumption that all filenames begin with the same
    ' single-char prefix character.
    strFilePrefix = Left(filenames(1), 1)

    For Each fname In filenames
        Dim dotpos, suffixLen As Integer
        Dim strCellAddr, strFnamePrio

        dotpos = InStr(1, fname, ".")
        suffixLen = Len(Mid(fname, dotpos))
        ' assume priority is specified by a single char in fname
        strFnamePrio = LCase(Mid(fname, dotpos - 1, 1))
        strCellAddr = Left(fname, Len(fname) - (suffixLen + 1)) ' "+1" to account for priority char
        strCellPrio = Range(strCellAddr)
        If (Trim(strCellPrio) = "") Then
            Range(strCellAddr) = strFnamePrio  ' store first prio occurrence
        Else
            ' assume filename prio characters (e.g., [a-z]) sort lexicographically
            If strFnamePrio > strCellPrio Then
                Range(strCellAddr) = strFnamePrio
            End If
        End If
        nRowNum = CLng(Mid(strCellAddr, 2))
        If nRowNum > maxRowNum Then
            maxRowNum = nRowNum
        End If
    Next

    For i = 1 To maxRowNum
        strCellPrio = Trim(Range(strFilePrefix & CStr(i)))
        If strCellPrio <> "" Then
            fname = strFilePrefix & CStr(i) & strCellPrio & ".xlsx"
            Debug.Print fname    ' <-- do analysis on fname
        End If
    Next i
End Sub

The code for listfiles is here. Note that this solution will not handle stray files that don't follow the assumed naming pattern. You'll need to add checks to weed them out.

decaffeinated
  • 76
  • 1
  • 5
0

Use a excel sheet or tabular format to express the file desired.

The excel sheet only needs two columns, A for the filename without the sub-revision, and column B for the desired sub-revision.

Compile and compose this information and then your vba implementation use the excel worksheet/tabular format to iterate and read "only" those files.

Use whatever language to compose the tabular format (in my case, python is preferred) and try to use any thing you can "to determine the sub-revision".

This allows you to debug the results more easily and use whatever language to compile a worksheet formatted or tabular delimited file.

0

This works because .GetFolder returns a sorted list.

Option Explicit
Sub FilesSelecter()

Dim fs As Object
Dim TargetPath As String
Dim DirList As Object
Dim File As Object
Dim BaseName As String
Dim RootFileName As String
Dim SaveRootFileName As String
Dim SaveBaseName As String

Set fs = CreateObject("Scripting.FileSystemObject")
TargetPath = "C:\Users\BeastMstr\Documents\TestFiles"
Set DirList = fs.Getfolder(TargetPath)

SaveRootFileName = ""
For Each File In DirList.Files
    BaseName = fs.getbasename(File)
    RootFileName = Left(BaseName, Len(BaseName) - 1)

    If SaveRootFileName = RootFileName Or SaveRootFileName = "" Then
        SaveRootFileName = RootFileName
        SaveBaseName = BaseName
    Else
        '
        'Do Somethingwith SaveBaseName
        '
        Debug.Print SaveBaseName
        SaveRootFileName = RootFileName
        SaveBaseName = BaseName
    End If

Next
'
' Do something with the last file
'
Debug.Print SaveBaseName

End Sub
REXXman
  • 376
  • 2
  • 4