5

This turned into a rather long post, and there's not really an "answer" per say. I'm more looking for an explanation as opposed to some silver bullet to fix the problem. As such, any aspect you'd like to answer would be quite appreciated. Thanks in advance!


I'm running into what may be a "problem" with the file system object, and that's lead to a question about the functionality etc. of how the File System Object in VBA works vs. "something else" (I don't know if there's an alternative to use in Excel for what I'm doing) in .net etc. I don't know of a better place to ask, and I'm not sure what to look into to research it for myself. So here I am!

So! To the problem. The short explanation is that I iterate through folders, gathering file information (name, extension, full path, etc.) and place it into a spreadsheet. I eventually use this information to copy the files to a new location. However, on a large scale (1,000+ files) this seems to work just fine locally, but it is considerably slower on a network location (at work). It will chew through like 1,500 files, wait a while, do 1,500 more etc. Either while listing or copying the files. Again, this is not the case when done locally, it will just run through without issue, so I can probably assume it's probably nothing to do with my code. It's almost as if the network is opening and closing a gate intermittently.

Alternatively, using other programs from an end user perspective (I tried it against the same files I was using with my program, on our work network) it is MUCH faster without any of the aforementioned delays. I'm assuming the alternative program is using some version of .net, if it matters. Long story short, I don't think I can inherently blame our network for the speed issues I'm running into.

So my question/curiosity/issue comes down to a few key points:

-What's the difference between the FSO in VBA and the default libraries in .Net, and could the difference between the cause of the issue I'm running into? Clearly it's possible to read this sort of data much more quickly than it is being done.

-Is the FSO not intended to be used this way (over a network, with large amounts of remote data, or... ?)? Is it just dated/outmoded? And is there an alternative that can be used through VBA?

-I only nebulously understand that our network functions in a different way than a local drive. It stores many terabytes of data, etc. and I'm not sure what the difference is at a very deep level between accessing a local drive and a network location. I know I'm not giving details on the network that would probably be very beneficial in diagnosis, I just don't the information unfortunately. I guess I'd just ask if it "potentially" an explanation that using the FSO in such a way with some/all sorts of networks is just not the way it's meant to be used. Is it possible that the network is set up in such a way to limit the sort of way I'm trying to interact with it?

-Even though I haven't run into any issues doing this locally, is it possible that something in my code is much more taxing to a network location vs. a local drive?

Thanks for any insight you can provide.

Finch042
  • 307
  • 3
  • 9
  • 18
  • 1
    Have you seen [this](http://stackoverflow.com/questions/10380312/loop-through-files-in-a-folder-using-vba?rq=1) post? It relates to using Dir rather than FSO and is apparently a lot faster. – Graham Anderson Sep 02 '13 at 22:45

4 Answers4

5

Finch042 acknowledges that he is only "nebulous" about the specifics of what is different when accessing a network server's file system vs. a local file system, and that his question is really about the relative speed different between those two circumstances. All of the other posts here assume that the issue is with his design choices and/or coding techniques but I think the underlying question has gone unanswered: why is it that network file operations can be so much slower?

The short answer is that a networked file system is on a different computer's disc at the end of LAN cable (or, worse, a Wifi signal), and such intermediary technology is much more limited in its data-transfer bandwidth than the electronics between a computer's processor and its local disc. It is true that modern LAN capacities are, relative to the stone-age, blindingly fast, but they are still way, way slower than the disc-interface electronics on a PC's motherboard. So you will always experience some level of performance degradation when accessing remote files.

Furthermore, many modern server farm systems may include mirroring (i.e. storage redundancy) for data-integrity maintenance and may also include automatic version-backup capabilities, both of which can add access time to some server operations, especially when writing new files or updating existing ones.

As for the fluctuations in the data transfer rates to/from the server, which Finch042 describes as an apparent "gating" of the data flow: whenever you are using a common-access technology, such as LAN systems and shared servers, you are usually competing with others who are trying to do similar stuff. For example, LAN technologies such as traditional Ethernet actually allow the various users to stomp all over each other’s transmission attempts and, when that does result in a failed attempt, it retrys until it succeeds. It is a design that trades simplicity and, thereby, ultimate overall reliability, for a (usually) minor loss in throughput speed. But when the demand on the network is high, it can result in a dramatic degradation in throughput for all users.

Similarly, a file server has a limited capacity to service file-system access requests and it, too, can become overloaded at times of high demand.

I suspect that Finch042's experience is likely related to those kinds of issues, especially if his organization's network and server system grew incrementally, and therefore in a non-optimized way, over a long time, and/or is at or near its capacity limit. And his experience of inconsistent data-transfer rates is likely just the ebb and flow of demand on the common, shared network/server systems.

Also, be aware that virus protection systems can interfere with file access speeds, especially for network server files.

pstraton
  • 1,080
  • 14
  • 9
2

(I'm posting as an answer as the following is too long for a comment.)

I get the impression you might be feeding values into Excel cells one at a time, or maybe a row at a time. I would use an array Dim arr(100, 4) As String fill it with values then fill a large range in one go Range("A1:E101") = arr. I would experiment with the size of 100 as I suspect it could be much larger. In preference to FSO I would use (VBA methods) Dir, FileCopy and Kill, only using FSO where necessary.

VB.NET has a number of other options, such as Lists (of a Class, perhaps) in-memory Stream, StringBuilder. However, if Excel Interop is still needed, then the advantage of these approaches may be lost. In which case I might consider writing to a csv file, which can be opened directly by Excel. Excel Interop could still be used, but I would write to the csv and then open it (as a single statement) in Excel.

Logically, I assume it would be more efficient to create this text-file in the same location as the network files, then move it afterwards - but someone might correct this assumption.

Andy G
  • 19,232
  • 5
  • 47
  • 69
  • Fine suggestions. I am writing straight to the sheet (with screen updating off), but I can definitely try an array if that's going to improve matters. The real problem (and I should have mentioned this) is that the FSO is the only way to reliably provide things like file extensions if your file doesn't have an extension, or has multiple periods in the name. etc. I think were a few other, similar, issues I ran into when I was writing the code. Might have to just sort it out anyway! Thanks for your ideas. – Finch042 Aug 22 '13 at 21:13
  • Personally, I don't think use of FSO will be a bottleneck, I suspect it is more likely to be the frequent reading and writing from the network. Good luck. – Andy G Aug 22 '13 at 21:28
0

Instead of using FSO, I would use DIR() if I want faster speed.
However, it is not so fail-safe so you would need to conduct couple of tests and make sure it works in all occasions.
For example, you may need to check individual parent folder in order to make sure they exist.

Anyways, Dir() should be faster because it is a native function.

Another way of solving this would be to use Batch (if you're on Widows of course!) or use command line to simply copy from one file to another file. You should see a dramatic increase in speed and you don't need to worry about checking every single subfolder for existence!

I happen to have a VBA code that would use windows commandline to do what I want. I got it from the internet but tweaked some error acknowledgements to bypass what I wanted to do:

Option Explicit
Option Base 0
Option Compare Text

Private Type SECURITY_ATTRIBUTES
    nLength As Long
    lpSecurityDescriptor As Long
    bInheritHandle As Long
End Type

Private Type PROCESS_INFORMATION
    hProcess As Long
    hThread As Long
    dwProcessId As Long
    dwThreadId As Long
End Type

Private Type STARTUPINFO
    cb As Long
    lpReserved As Long
    lpDesktop As Long
    lpTitle As Long
    dwX As Long
    dwY As Long
    dwXSize As Long
    dwYSize As Long
    dwXCountChars As Long
    dwYCountChars As Long
    dwFillAttribute As Long
    dwFlags As Long
    wShowWindow As Integer
    cbReserved2 As Integer
    lpReserved2 As Byte
    hStdInput As Long
    hStdOutput As Long
    hStdError As Long
End Type

Private Const WAIT_INFINITE         As Long = (-1&)
Private Const STARTF_USESHOWWINDOW  As Long = &H1
Private Const STARTF_USESTDHANDLES  As Long = &H100
Private Const SW_HIDE               As Long = 0&

Private Declare Function CreatePipe Lib "kernel32" (phReadPipe As Long, phWritePipe As Long, lpPipeAttributes As SECURITY_ATTRIBUTES, ByVal nSize As Long) As Long
Private Declare Function CreateProcess Lib "kernel32" Alias "CreateProcessA" (ByVal lpApplicationName As Long, ByVal lpCommandLine As String, lpProcessAttributes As Any, lpThreadAttributes As Any, ByVal bInheritHandles As Long, ByVal dwCreationFlags As Long, lpEnvironment As Any, ByVal lpCurrentDriectory As String, lpStartupInfo As STARTUPINFO, lpProcessInformation As PROCESS_INFORMATION) As Long
Private Declare Function ReadFile Lib "kernel32" (ByVal hFile As Long, lpBuffer As Any, ByVal nNumberOfBytesToRead As Long, lpNumberOfBytesRead As Long, lpOverlapped As Any) As Long
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function WaitForSingleObject Lib "kernel32" (ByVal hHandle As Long, ByVal dwMilliseconds As Long) As Long
Private Declare Function GetExitCodeProcess Lib "kernel32" (ByVal hProcess As Long, lpExitCode As Long) As Long
Private Declare Sub GetStartupInfo Lib "kernel32" Alias "GetStartupInfoA" (lpStartupInfo As STARTUPINFO)
Private Declare Function GetFileSize Lib "kernel32" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long

Public Function Redirect(szBinaryPath As String, szCommandLn As String) As String

Dim tSA_CreatePipe              As SECURITY_ATTRIBUTES
Dim tSA_CreateProcessPrc        As SECURITY_ATTRIBUTES
Dim tSA_CreateProcessThrd       As SECURITY_ATTRIBUTES
Dim tSA_CreateProcessPrcInfo    As PROCESS_INFORMATION
Dim tStartupInfo                As STARTUPINFO
Dim hRead                       As Long
Dim hWrite                      As Long
Dim bRead                       As Long
Dim abytBuff()                  As Byte
Dim lngResult                   As Long
Dim szFullCommand               As String
Dim lngExitCode                 As Long
Dim lngSizeOf                   As Long

tSA_CreatePipe.nLength = Len(tSA_CreatePipe)
tSA_CreatePipe.lpSecurityDescriptor = 0&
tSA_CreatePipe.bInheritHandle = True

tSA_CreateProcessPrc.nLength = Len(tSA_CreateProcessPrc)
tSA_CreateProcessThrd.nLength = Len(tSA_CreateProcessThrd)

If (CreatePipe(hRead, hWrite, tSA_CreatePipe, 0&) <> 0&) Then
    tStartupInfo.cb = Len(tStartupInfo)
    GetStartupInfo tStartupInfo

    With tStartupInfo
        .hStdOutput = hWrite
        .hStdError = hWrite
        .dwFlags = STARTF_USESHOWWINDOW Or STARTF_USESTDHANDLES
        .wShowWindow = SW_HIDE
    End With

    szFullCommand = """" & szBinaryPath & """" & " " & szCommandLn
    lngResult = CreateProcess(0&, szFullCommand, tSA_CreateProcessPrc, tSA_CreateProcessThrd, True, 0&, 0&, vbNullString, tStartupInfo, tSA_CreateProcessPrcInfo)

    If (lngResult <> 0&) Then
        lngResult = WaitForSingleObject(tSA_CreateProcessPrcInfo.hProcess, WAIT_INFINITE)
        lngSizeOf = GetFileSize(hRead, 0&)
        If (lngSizeOf > 0) Then
            ReDim abytBuff(lngSizeOf - 1)
            If ReadFile(hRead, abytBuff(0), UBound(abytBuff) + 1, bRead, ByVal 0&) Then
                Redirect = StrConv(abytBuff, vbUnicode)
            End If
        End If
        Call GetExitCodeProcess(tSA_CreateProcessPrcInfo.hProcess, lngExitCode)
        CloseHandle tSA_CreateProcessPrcInfo.hThread
        CloseHandle tSA_CreateProcessPrcInfo.hProcess

        'If (lngExitCode <> 0&) Then Err.Raise vbObject + 1235&, "GetExitCodeProcess", "Non-zero Application exist code"

        CloseHandle hWrite
        CloseHandle hRead
    Else
        Err.Raise vbObject + 1236&, "CreateProcess", "CreateProcess Failed, Code: " & Err.LastDllError
    End If
End If
End Function

You would use the commandline through
resp = Redirect("cmd", strCmd)
where cmd is equivalent to pressing windows + R and strCmd is the string you input into that Run prompt.

To further answer your question about the difference in performance between local drives and network drives, working with network drives will always be slower in any type of code. The background code that runs when we access network drive is complex but I don't know the specifics.

Hope it helps,
Cheers,
kpark

kpark
  • 394
  • 2
  • 12
  • Thanks! I'll check it out. There's just a few tools that the FSO was providing that I had use for. Reliably getting the file extension, generating a short path, etc. But I might just have to find another way to do those sorts of things if the Dir function is that much faster. Thanks for the idea, and the code for that matter. Very helpful place to start. – Finch042 Aug 22 '13 at 21:15
  • No problem. VBA code is one of the slower language so it's better to use batch to do most of the stuff that deals with copying / pasting large number of files especially over a network since batch is a native language (I think..). – kpark Aug 23 '13 at 13:19
0

What do you mean by fast, for 1500 files on a network I think that the following implementation using FSO isn't too slow, but how fast were you hoping for?

Sub TestBuildFileStructure()
' Call to test GetFiles function.

Const sDIRECTORYTOCHECK As String = <enter path to check from as string>

Dim varItem         As Variant
Dim wkbOutputFile   As Workbook
Dim shtOutputSheet  As Worksheet
Dim sDate           As String
Dim sPath           As String
Dim lRowNumber      As Long
Dim vSplit          As Variant

sPath = ThisWorkbook.Path

sDate = CStr(Now)
vSplit = Split(sDate, "/")
sDate = vSplit(0) & vSplit(1) & vSplit(2)
vSplit = Split(sDate, ":")
sDate = vSplit(0) & vSplit(1) & vSplit(2)

sDate = "Check " & sDate

Set wkbOutputFile = Workbooks.Add
'wkbOutputFile.Name = sDate
Set shtOutputSheet = wkbOutputFile.Sheets.Add
shtOutputSheet.Name = "Output"

lRowNumber = 1


Call BuildFileStructure(sDIRECTORYTOCHECK, shtOutputSheet, lRowNumber, True)

wkbOutputFile.SaveAs (sPath & "\" & sDate)



Cleanup:

Set shtOutputSheet = Nothing
Set wkbOutputFile = Nothing

End Sub

Function BuildFileStructure(ByVal strPath As String, _
                ByRef shtOutputSheet As Worksheet, _
                ByRef lRowNumber As Long, _
                Optional ByVal blnRecursive As Boolean) As Boolean

   ' This procedure returns all the files in a directory into
   ' an excel file. If called recursively, it also returns
   ' all files in subfolders.

    Const iNAMECOLUMN As Integer = 1

    Dim fsoSysObj       As FileSystemObject
    Dim fdrFolder       As Folder
    Dim fdrSubFolder    As Folder
    Dim filFile         As File

    ' Return new FileSystemObject.
    Set fsoSysObj = New FileSystemObject

    On Error Resume Next
    ' Get folder.
    Set fdrFolder = fsoSysObj.GetFolder(strPath)

    If Err <> 0 Then
      ' Incorrect path.
        BuildFileStructure = False
        GoTo BuildFileStructure_End
    End If
    On Error GoTo 0

    ' Loop through Files collection, adding to dictionary.
    For Each filFile In fdrFolder.Files
      shtOutputSheet.Cells(lRowNumber, iNAMECOLUMN).Value = filFile.Path
       lRowNumber = lRowNumber + 1
    Next filFile

    ' If Recursive flag is true, call recursively.
    If blnRecursive Then
        For Each fdrSubFolder In fdrFolder.SubFolders
            Call BuildFileStructure(fdrSubFolder.Path, shtOutputSheet, lRowNumber, True)
        Next fdrSubFolder
    End If

    ' Return True if no error occurred.
    BuildFileStructure = True

BuildFileStructure_End:
    Set fdrSubFolder = Nothing
    Set fdrFolder = Nothing
    Set filFile = Nothing
    Set fsoSysObj = Nothing

    Exit Function
End Function
Andy G
  • 19,232
  • 5
  • 47
  • 69
Graham Anderson
  • 1,209
  • 10
  • 17
  • It's not so much a particular speed I'm looking for, so much as it seems to be extremely (probably more than 50x) slower doing this with large groups of files over the network vs. a local drive or versus a .net alternative doing the same things to the same stuff on a network. – Finch042 Aug 22 '13 at 21:09