0

This code finds a txt file with a variable name in a folder structure with variable names.

The underlined parts are the variable parts that I can't know enter image description here

and further back the folder structure looks like this: enter image description here

sub Main()
    Dim FileSystem As Object
    Dim HostFolder As String

    ' for loop that loops multiple days omitted for simplicity
    HostFolder = "H:\Dokument\Avvikelser\" & Format(dd, "YYYY\\mm\\dd") ' dd is the date it should look at

    Set FileSystem = CreateObject("Scripting.FileSystemObject")
    DoFolder FileSystem.GetFolder(HostFolder)
End sub

Sub DoFolder(Folder)
    Dim SubFolder
    For Each SubFolder In Folder.SubFolders
        'Debug.Print SubFolder
        ' since I don't need to look in the folder called 51562 I added the if below
        If IsNumeric(Right(SubFolder, 5)) = False Then DoFolder SubFolder
        DoEvents
    Next
    Dim File
    
    Dim objStream
    Set objStream = CreateObject("ADODB.Stream")
    
    For Each File In Folder.Files
        ' only the text file is interesting 
        If Right(File, 3) = "txt" Then
            Debug.Print File
            objStream.Charset = "utf-8"
            objStream.Open
            objStream.LoadFromFile (File)
    
            cont = objStream.ReadText()
            objStream.Close
            Set objStream = Nothing
            ' strData is a public string that is appended with multiple textfiles contents
            strData = strData & cont
            DoEvents
        End If
    Next
End Sub

The barebone of the code is copied from https://stackoverflow.com/a/22645439/5159168
The issue I have is that even though I have tried to make it only traverse the parts it has to it still takes about 2-2.5 seconds between each file is read.
Is there anything that I can do to make this faster?

Andreas
  • 23,610
  • 6
  • 30
  • 62
  • Probably a question for https://codereview.stackexchange.com if there is no actual issue in the code. How many subfolders does it ned to run through? What is the average amount of iterations the code runs through until it finds a txt file (put a global variable as counter to check out). I would try to first collect all the paths of the txt files in an array or collection and in a second step process the files using the list of paths from the first step. So it is probably more clear which is the time consuming part and easier to optimize one of the processes. – Pᴇʜ Feb 02 '21 at 10:26
  • 1
    The rule: Keep your procedures small and keep them doing only one thing (and do that right). Usually leads to better code and is easier to optimize. – Pᴇʜ Feb 02 '21 at 10:28
  • What you see in the images is the things it iterates. I have minimized it as much as possible (except the pdf file). In the folder YEAR\MONTH\DAY there is all the suppliers that has had quality issues. Each supplier could potentially have more than one product with issues per day, the example above is one supplier with strawberries. But in each folder with the product name there is only one text file just as the example shows. So in short it iterates as little as possible (except the pdf which is probably negligable) – Andreas Feb 02 '21 at 10:49
  • 1
    Note that you **always** create a adodb stream `Set objStream = CreateObject("ADODB.Stream")` even if the file is not a txt file. Make sure you put those 2 lines within `If` block `If Right(File, 3) = "txt" Then`. Creating the stream might take unnecessaryily time. • The loop through the folders should actually be pretty quick. Actually this kind of mistakes is why I suggested to strip down your procedures to do only one thing. Mistakes are easier to figure out then. – Pᴇʜ Feb 02 '21 at 10:55

1 Answers1

1

Example of how to split your procedures (so each of them does less things): The first one GetAllTextFilesFromNonNumericSubFolders just collects the text files of all non numeric sub folders and collects them in a FileList. The second one ProceedTextFiles then uses the FileList to work with these text files.

Now you can easily check out which one is the bottleneck. Just comment out ProceedTextFiles FileList in your Main() procedure. If this runs fast the bottleneck is not looping through the folders. If it is slow, you can try to find an more optimized way of collecting the text files.

Option Explicit

Public Sub Main()
    Dim HostFolder As String
    HostFolder = "H:\Dokument\Avvikelser\" & Format(dd, "YYYY\\mm\\dd")

    Dim FileSystem As Object
    Set FileSystem = CreateObject("Scripting.FileSystemObject")
    
    Dim FileList As New Collection

    GetAllTextFilesFromNonNumericSubFolders FileSystem.GetFolder(HostFolder), FileList
    
    If FileList.Count > 0 Then
        ProceedTextFiles FileList
    Else
        MsgBox "No text files found in '" & HostFolder & "'.", vbExclamation
    End If
End Sub


Public Sub GetAllTextFilesFromNonNumericSubFolders(ByVal Folder As Folder, ByRef FileList As Collection)
    Dim SubFolder As Variant
    For Each SubFolder In Folder.SubFolders
        'Debug.Print SubFolder
        ' since I don't need to look in the folder called 51562 I added the if below
        If IsNumeric(Right(SubFolder, 5)) = False Then GetAllTextFilesFromNonNumericSubFolders SubFolder, FileList
        DoEvents  'ToDo: remove this line to optimize speed
    Next
    
    Dim File As Variant
    For Each File In Folder.Files
        ' only the text file is interesting
        If Right(File, 3) = "txt" Then
            Debug.Print File  'ToDo: remove this line to optimize speed
            FileList.Add File
        End If
    Next
End Sub

Public Sub ProceedTextFiles(ByVal FileList As Collection)
    Dim objStream As Variant
    Set objStream = CreateObject("ADODB.Stream")

    Dim File As Variant
    For Each File In FileList
        Debug.Print File  'ToDo: remove this line to optimize speed
        objStream.Charset = "utf-8"
        objStream.Open
        objStream.LoadFromFile File

        cont = objStream.ReadText()
        objStream.Close
        Set objStream = Nothing
        ' strData is a public string that is appended with multiple textfiles contents
        strData = strData & cont
        
        DoEvents  'ToDo: remove this line to optimize speed
    Next File
End Sub

Note that the ByRef FileList in the GetAllTextFilesFromNonNumericSubFolders procedure is a return variable. Because it is ByRef the changes made in teh procedure are returned back to the Main procedure.


Alternative:

Give the follwing a try. This uses the dir *.txt /A-H-S /B /S command to go through the subfolders and should be much quicker.

Option Explicit

Public Sub Main()
    Dim HostFolder As String
    HostFolder = "H:\Dokument\Avvikelser\" & Format(dd, "YYYY\\mm\\dd")

    Dim FileList() As String
    FileList = GetAllTextFilesSubFolders(HostFolder)
    
    If (Not Not FileList) <> 0 Then
        ProceedTextFiles FileList
    Else
        MsgBox "No text files found in '" & HostFolder & "'.", vbExclamation
    End If
End Sub


Public Function GetAllTextFilesSubFolders(ByVal Folder As String) As Variant
    GetAllTextFilesSubFolders = Split(CreateObject("WScript.Shell").Exec("cmd /c dir """ & Folder & "\""*.txt /A-H-S /B /S").StdOut.ReadAll, vbNewLine)
End Function



Public Sub ProceedTextFiles(ByRef FileList() As String)
    Dim objStream
    Set objStream = CreateObject("ADODB.Stream")

    Dim File As Variant
    For Each File In FileList
        Debug.Print File
        objStream.Charset = "utf-8"
        objStream.Open
        objStream.LoadFromFile File

        cont = objStream.ReadText()
        objStream.Close
        Set objStream = Nothing
        ' strData is a public string that is appended with multiple textfiles contents
        strData = strData & cont
        
        DoEvents
    Next File
End Sub

Added by OP.

UTF-8 version
Since you can't read the cmd window in UTF-8, the easy dirty hack is to replace the wrong characters in the string before you split.

Public Function GetAllTextFilesSubFolders(ByVal Folder As String) As Variant
    GetAllTextFilesSubFolders = Split(Replace(Replace(Replace(CreateObject("WScript.Shell").Exec("cmd /c dir """ & Folder & "\""*.txt /A-H-S /B /S").StdOut.ReadAll, "†", "å"), "„", "ä"), "”", "ö"), vbNewLine)
End Function

2 seconds! enter image description here

Andreas
  • 23,610
  • 6
  • 30
  • 62
Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
  • This actually makes no difference. I added a debug.print now() before and after both our codes and your code took 12 seconds, and mine 8. Both codes have the doevents and only one debug.print to print out the text files. The only difference between our codes is that you filter out the pdf file in your second loop, but I do that well... also in the second loop but it's not processed more or less. https://i.stack.imgur.com/I5VCQ.png – Andreas Feb 02 '21 at 12:57
  • @Andreas As I said: Comment out `ProceedTextFiles FileList` and test the time then. So we know which part is the slow one. How many seconds does it take when this is commented? – Pᴇʜ Feb 02 '21 at 13:10
  • 7 seconds without `ProceedTextFiles FileList` – Andreas Feb 02 '21 at 13:12
  • Well then looping through your folders takes 7 seconds. There is probably nothing that can improve that unless you can somehow reduce the amount of folders/files that need to be checked. – Pᴇʜ Feb 02 '21 at 13:19
  • Ok.. Thanks. There is not much I can do with the folder/files. I have already omitted all the files within the red with the numeric check. https://i.stack.imgur.com/9M8wH.png the only thing left is what has to be read. – Andreas Feb 02 '21 at 13:49
  • @Andreas checkout my alternative in the edited answer. It actually performs a `dir *.txt /A-H-S /B /S` on the shell and returns that in an array. So you get all the file paths that end on `.txt` including the sub folders. – Pᴇʜ Feb 02 '21 at 14:29
  • @Pᴇʜ: You noted that the OP's code created a new ADODB.Stream for each folder. Your code creates one for each file. BTW, do you really need ADODB.Stream just for reading text files? Isn't `Open file1 For Input As #1` sufficient? Or declaring WinAPI's CreateFile, ReadFile and CloseHandle? – z32a7ul Feb 02 '21 at 14:49
  • @z32a7ul True, I changed it so it creates only one stream. Yes, there might be other ways to open the file, no idea which one is the most efficient in this case (someone has to test that). But actually 7 seconds to go through the folders just to find the `*.txt` files is probably the bottle neck. – Pᴇʜ Feb 02 '21 at 14:53
  • I'll have a look at this tomorrow at work. Regarding ADODB yes that is sadly needed. As far as I know I can't read utf-8 any other way. – Andreas Feb 02 '21 at 21:25
  • 1
    Very very close https://i.stack.imgur.com/wGdNM.png just the pesky utf 8 thing again.. Also there is a `\\\` missing in the `"""`, it should be `"\\\""` – Andreas Feb 03 '21 at 07:04
  • @Andreas yes sadly UTF-8 is still an issue. – Pᴇʜ Feb 03 '21 at 07:51