-1

I have a script that I use to extract metadata from files in a network directory. It originates here, and I modified it in order to obtain additional metadata (size of file, filehash, date created,and lastwritetime), but these additions appear to slow down the script to the point that it takes weeks to complete when the number of files is more than 10000.

To illustrate the impact of the script additions on the speed, I ran the script on a folder containing five documents:

  • original script (no get-item or get-file hash lines): 2.9794699 seconds
  • with 'get-item' lines (size, filehash, created, lastwritetime): 7.6295035 seconds
  • with 'get-filehash' line : 6.9363834 seconds
  • with 'get-item' lines and 'get-filehash' lines: 12.4516334 seconds

I tried putting all the get-item lines together in a for-loop thinking that it would be faster to retrieve the file once from the network, then extract the metadata. While this modified script runs at a much faster 8.6488492 seconds, the metadata fields are not included in the output.

Here's the original script:

#Works on Powershell version 5.1 #The filepath of the folder being printed and the filepath where the output file will be placed need to be specified in the last line of script.

Function Get-FolderItem {

    [cmdletbinding(DefaultParameterSetName='Filter')]
    Param (
        [parameter(Position=0,ValueFromPipeline=$True,ValueFromPipelineByPropertyName=$True)]
        [Alias('FullName')]
        [string[]]$Path = $PWD,
        [parameter(ParameterSetName='Filter')]
        [string[]]$Filter = '*.*',    
        [parameter(ParameterSetName='Exclude')]
        [string[]]$ExcludeFile,              
        [parameter()]
        [int]$MaxAge,
        [parameter()]
        [int]$MinAge
    )
    Begin {
        $params = New-Object System.Collections.Arraylist
        $params.AddRange(@("/L","/E","/NJH","/NDL","/BYTES","/FP","/NC","/XJ","/R:0","/W:0","T:W"))
        If ($PSBoundParameters['MaxAge']) {
            $params.Add("/MaxAge:$MaxAge") | Out-Null
        }
        If ($PSBoundParameters['MinAge']) {
            $params.Add("/MinAge:$MinAge") | Out-Null
        }
    }
    Process {
        ForEach ($item in $Path) {
            Try {
                $item = (Resolve-Path -LiteralPath $item -ErrorAction Stop).ProviderPath
                If (-Not (Test-Path -LiteralPath $item -Type Container -ErrorAction Stop)) {
                    Write-Warning ("{0} is not a directory and will be skipped" -f $item)
                    Return
                }
                If ($PSBoundParameters['ExcludeFile']) {
                    $Script = "robocopy `"$item`" NULL $Filter $params /XF $($ExcludeFile  -join ',')"
                } Else {
                    $Script = "robocopy `"$item`" NULL $Filter $params"
                }
                Write-Verbose ("Scanning {0}" -f $item)
                Invoke-Expression $Script | ForEach {
                    Try {
                        If ($_.Trim() -match "^(?<Children>\d+)\s+(?<FullName>.*)") {
                           $object = New-Object PSObject -Property @{
                                FullName = $matches.FullName
                                Extension = $matches.fullname -replace '.*\.(.*)','$1'
                                FullPathLength = [int] $matches.FullName.Length
                                Stuff = foreach {$matches in $match}Length = (Get-Item $matches.FullName).length
                                FileHash = Get-FileHash -Path "\\?\$($matches.FullName)" |Select -Expand Hash
                                Created = (Get-Item $matches.FullName).creationtime
                                LastWriteTime = (Get-Item $matches.FullName).LastWriteTime
                                Owner = (Get-ACL $matches.Fullname).Owner
                            } 
                            $object.pstypenames.insert(0,'System.IO.RobocopyDirectoryInfo')
                            Write-Output $object
                        } Else {
                            Write-Verbose ("Not matched: {0}" -f $_)
                        }
                    } Catch {
                        Write-Warning ("{0}" -f $_.Exception.Message)
                        Return
                    }
                }
            } Catch {
                Write-Warning ("{0}" -f $_.Exception.Message)
                Return
            }
        }
    }
}

Get-FolderItem "O:\directory\to\files" | Export-Csv -Path C:\output.csv

Does anyone know how to make the script run faster?

oymonk
  • 427
  • 9
  • 27
  • Why are you running `robocopy` at all? Is `Get-ChildItem` not working against the share? – Mathias R. Jessen May 06 '21 at 16:39
  • good question - I use robocopy in order to obtain the metadata of files that are longer than 260 characters. – oymonk May 06 '21 at 16:44
  • Rather than `(Get-Item $matches.FullName).length` you could do `([System.IO.FileInfo]$Matches.FullName).length`. I see much better performance from that (`Get-Item` taking about 3.5x longer). Same for LastWriteTime. – TheMadTechnician May 06 '21 at 23:20
  • YES! @TheMadTechnician this works!!! - it brought the speed of the script from 12.45 seconds to 1.399 seconds! This is seriously wonderful news. Can you post as answer and I'll mark it up? – oymonk May 07 '21 at 00:04

1 Answers1

1

Rather than (Get-Item $matches.FullName).length you could do ([System.IO.FileInfo]$Matches.FullName).length. I see much better performance from that (Get-Item taking about 3.5x longer). Same for LastWriteTime.

TheMadTechnician
  • 34,906
  • 3
  • 42
  • 56