0

I have this script which parses all shares on a file server to gather information on share size, ACLs, and count of files and folders. The script works great on smaller file servers but on hosts with large shares it consumes all RAM and crashes the host, I can't seem to figure out how to optimize the script during the Get-ChildItem portion to not consume all RAM.

I found a few articles which mentioned to use a foreach loop and pipe out what I need. I am a Powershell beginner, I can't figure out how to get it to work like that. What can I try next?

$ScopeName     = Read-Host "Enter scope name to gather data on"
$SavePath      = Read-Host "Path to save results and log to"
$SaveCSVPath   = "$SavePath\ShareData.csv"
$TranscriptLog = "$SavePath\Transcript.log"

Write-Host
Start-Transcript -Path $TranscriptLog

$StartTime = Get-Date
$Start     = $StartTime | Select-Object -ExpandProperty DateTime

$Exclusions = {$_.Description -ne "Remote Admin" -and $_.Description -ne "Default Share" -and $_.Description -ne "Remote IPC" }
$FileShares = Get-SmbShare -ScopeName $ScopeName | Where-Object $Exclusions
$Count      = $FileShares.Count
Write-Host
Write-Host "Gathering data for $Count shares" -ForegroundColor Green
Write-Host
Write-Host "Results will be saved to $SaveCSVPath" -ForegroundColor Green
Write-Host

ForEach ($FileShare in $FileShares)
{
    $ShareName = $FileShare.Name
    $Path      = $Fileshare.Path

    Write-Host "Working on: $ShareName - $Path" -ForegroundColor Yellow
    
    $GetObjectInfo = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue

    $ObjSize = $GetObjectInfo | Measure-Object -Property Length -Sum -ErrorAction SilentlyContinue

    $ObjectSizeMB = "{0:N2}" -f ($ObjSize.Sum / 1MB)
    $ObjectSizeGB = "{0:N2}" -f ($ObjSize.Sum / 1GB)
    $ObjectSizeTB = "{0:N2}" -f ($ObjSize.Sum / 1TB)

    $NumFiles   = ($GetObjectInfo | Where-Object {-not $_.PSIsContainer}).Count
    $NumFolders = ($GetObjectInfo | Where-Object {$_.PSIsContainer}).Count
    
    $ACL            = Get-Acl -Path $Path
    $LastAccessTime = Get-ItemProperty $Path | Select-Object -ExpandProperty LastAccessTime
    $LastWriteTime  = Get-ItemProperty $Path | Select-Object -ExpandProperty LastWriteTime

    $Table = [PSCustomObject]@{
        'ScopeName'          = $FileShare.ScopeName
        'Sharename'          = $ShareName
        'SharePath'          = $Path
        'Owner'              = $ACL.Owner
        'Permissions'        = $ACL.AccessToString
        'LastAccess'         = $LastAccessTime
        'LastWrite'          = $LastWriteTime
        'Size (MB)'          = $ObjectSizeMB
        'Size (GB)'          = $ObjectSizeGB
        'Size (TB)'          = $ObjectSizeTB
        'Total File Count'   = $NumFiles
        'Total Folder Count' = $NumFolders
        'Total Item Count'   = $GetObjectInfo.Count
    }

    $Table | Export-CSV -Path $SaveCSVPath -Append -NoTypeInformation 
}

$EndTime = Get-Date
$End     = $EndTime | Select-Object -ExpandProperty DateTime

Write-Host
Write-Host "Script start time: $Start" -ForegroundColor Green
Write-Host "Script end time: $End" -ForegroundColor Green

Write-Host
$ElapsedTime = $(($EndTime-$StartTime))
Write-Host "Elapsed time: $($ElapsedTime.Days) Days $($ElapsedTime.Hours) Hours $($ElapsedTime.Minutes) Minutes $($ElapsedTime.Seconds) Seconds $($ElapsedTime.MilliSeconds) Milliseconds" -ForegroundColor Cyan

Write-Host
Write-Host "Results saved to $SaveCSVPath" -ForegroundColor Green

Write-Host
Write-Host "Transcript saved to $TranscriptLog" -ForegroundColor Green

Write-Host
Stop-Transcript
halfer
  • 19,824
  • 17
  • 99
  • 186
ShaynG
  • 5
  • 4
  • Look up the details for PowerShell parallel process, PowerShell Jobs. You are asking for a to n of detail that must be acted upon for every share, folder, subfolder, etc. You must expect that this will take a long time regardless of how you try and tune this. YOu could literally have dozens, hundreds, thousands for these to check. – postanote Aug 12 '22 at 03:49
  • Thanks @postanote . I'm not really concerned with how long the script takes, more concerned about the script consuming all the ram on the host and I either get out of memory exceptions or the host goes unresponsive and I have to either reboot or kill the script. Thanks for the suggestion on looking up Powershell parallel processes, I'll see what I can find. – ShaynG Aug 12 '22 at 13:10
  • @postanote, I was also thinking in the `-parallel` direction but it is actually a incorrect suggestion as it might improve the performance but will likely even use more memory for all the parallel threads running simultaneously. – iRon Aug 12 '22 at 14:33

2 Answers2

0

To correctly use the PowerShell pipeline (and preserve memory as each item is streamed separately), use the PowerShell ForEach-Object cmdlet (unlike the ForEach statement) and avoid assigning the pipeline to a variable (as you doing with $FileShares = ...) and don't use parenthesis ((...)) arround the the pipeline:

Get-SmbShare -ScopeName $ScopeName | Where-Object $Exclusions | ForEach-Object {

And replace all $FileShare variables in your loop with the current item: $_ variable (e.g. $FileShare.Name$_.Name).

For the Get-Childitem part you might do the same thing (stream! meaning: use the mighty PowerShell pipeline rather than piling everything up in $GetObjectInfo):

$ObjSize = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue |
    Measure-Object -Property Length -Sum -ErrorAction SilentlyContinue

As an aside; you might simplify your 3 size properties to a single smarter size property, see: How to convert value to KB, MB, or GB depending on digit placeholders?

addition
"But isn't putting everything into $ObjSize just swapping one variable for another?"
No it is not, think of the PowerShell pipeline as an assembly line. In this case, at the first station you take each single file information and pass it to the next (last) station where you just sum the length property and the current (file) object disposed.
Where in your question example, you read the information of all files in once and store it into $GetObjectInfo and than go to the whole list to just use (add) the length property of the (quiet heavy) PowerShell file objects.

But why don't you try it?:

Open a new PowerShell session and run:

$Path = '.'
$GetObjectInfo = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue
$ObjSize = $GetObjectInfo | Measure-Object -Property Length -Sum -ErrorAction SilentlyContinue
Get-Process -ID $PID

Now, open a new session again and use the PowerShell pipeline:

$Path = '.'
$ObjSize = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue |
           Measure-Object -Property Length -Sum -ErrorAction SilentlyContinue
Get-Process -ID $PID

Notice the difference in memory usage (WS(M)).

iRon
  • 20,463
  • 10
  • 53
  • 79
  • Thank you for your comments. I will look into implementing all of this, however I don't see how this will help with the Get-ChildItem portion consuming all the ram on the host. The '$FileShares =' line is only getting a list of shares on the host for use in the foreach loop. I'll look into converting the foreach to the ForEach-Object though. – ShaynG Aug 12 '22 at 13:16
  • Sorry, I missed that part. I was to much focused on giving a general answer. I have updated the answer. – iRon Aug 12 '22 at 14:27
  • Thank you again! But isn't putting everything into `$ObjSize` just swapping one variable for anothe? In your example if I'm reading it correctly it's just taking out `$GetObjectInfo` and putting everything into `$ObjSize` instead. – ShaynG Aug 15 '22 at 15:16
  • No, it is not (that is what I am trying the tell you (see my **addition** in the answer). – iRon Aug 15 '22 at 16:57
  • Thank you for providing more detail. I did try this in my script to have everything on one line as in your example and it didn't consume ram like it did before, on a share with 6 million files it only took one hour to complete and only used about 40MB of ram, amazing! However it wasn't able to capture the folder count. It got everything else but the folder count. I remember now why I broke the script up this way with measuring `$ObjSize` was so I can get all those counts. – ShaynG Aug 15 '22 at 18:53
  • I see, in that case I would use `ForEach-Object` and do the sums and counts within, something like: `{ $Length += $_.Length; if ($_.PSIsContainer) { $FolderCount++ } else { $FileCount++ } }` (you will need to initialize `= 0` the counters). – iRon Aug 15 '22 at 19:28
  • 1
    This last suggestion worked beautifully. Script used a minimal amount of ram and finished in under an hour. I really appreciate your time and patience in assisting me here. I really learned a lot from your replies. – ShaynG Aug 16 '22 at 22:59
0

You are buffering the entire collection of [FileSystemInfo] on $FileShare into a variable with...

$GetObjectInfo = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue

So, if there's a million directories and files on that share then that's a million [FileSystemInfo] instances stored in a million-element array, none of which can be garbage collected during that iteration of the foreach loop. You can use Group-Object to improve that a bit...

$groupsByPSIsContainer = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue |
    Group-Object -Property 'PSIsContainer' -AsHashTable
# $groupsByPSIsContainer is a [Hashtable] with two keys:
#     - $true gets the collection of directories
#     - $false gets the collection of files

$ObjSize = $groupsByPSIsContainer[$false] | Measure-Object -Property Length -Sum -ErrorAction SilentlyContinue

$NumFiles   = $groupsByPSIsContainer[$false].Count
$NumFolders = $groupsByPSIsContainer[$true].Count

...but that still ends up storing all of the [FileSystemInfo]s in the two branches of the [Hashtable]. Instead, I would just enumerate and count the results myself...

$ObjSize    = 0L # Stores the total file size directly; use $ObjSize instead of $ObjSize.Sum
$NumFiles   = 0
$NumFolders = 0

foreach ($fileSystemInfo in Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue)
{
    if ($fileSystemInfo.PSIsContainer)
    {
        $NumFolders++
    }
    else
    {
        $NumFiles++
        $ObjSize += $fileSystemInfo.Length
    }
}

That stores only the current enumeration result in $fileSystemInfo and never the entire sequence.

Note that if you weren't summing the files' sizes Group-Object would work well...

$groupsByIsContainer = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue |
    Group-Object -Property 'PSIsContainer' -NoElement

$NumFiles   = ($groupsByIsContainer | Where-Object -Property 'Name' -EQ -Value $false).Count
$NumFolders = ($groupsByIsContainer | Where-Object -Property 'Name' -EQ -Value $true ).Count

-NoElement prevents the resulting group objects from storing the grouped elements; we just care about the count of members in each grouping but not the members themselves. If we passed -AsHashTable then we'd lose the convenient Count property, hence why the two groups have to be accessed in this awkward way.

Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68
  • Thank you for this insight! So informative and I'll look into modifying my code in this way. Just one follow-up. Is line a typo with the "L" here at the end? **$ObjSize = 0L** – ShaynG Aug 15 '22 at 15:18
  • The `L` suffix in the numeric literal `0L` makes the integer a `[Long]` (alias for [`[Int64]`](https://docs.microsoft.com/dotnet/api/system.int64)) whereas just `0` would be an `[Int32]`. See [`about_Numeric_Literals`](https://docs.microsoft.com/powershell/module/microsoft.powershell.core/about/about_numeric_literals). Another way to do this would be `[Int64] $ObjSize = 0` which not only makes the current value of `$ObjSize` an `[Int64]` but also prevents from it from ever storing anything _but_ an `[Int64]` (e.g. subsequently running `$ObjSize = 'abc'` or `$ObjSize = Get-Date` will fail). – Lance U. Matthews Aug 15 '22 at 17:42
  • Ah I see. Thanks for the explanation. I tried your suggestion to 'enumerate and count the results myself' but unfortunately it still consumed all the ram on the host. I had to kill the script. – ShaynG Aug 15 '22 at 20:20
  • I don't know why that would be. You replaced the lines starting with `$GetObjectInfo = Get-Childitem -Path $Path -Recurse -Force -ErrorAction SilentlyContinue` and ending with `$NumFolders = ($GetObjectInfo | Where-Object {$_.PSIsContainer}).Count` in your script with only my snippet above containing the `foreach` loop, right? – Lance U. Matthews Aug 15 '22 at 20:30
  • Yes, that's correct. Actually I only used your foreach snippet you provided and hardcoded the path instead of using the variable to test. RAM usage went up to 14GB before I killed the script. – ShaynG Aug 15 '22 at 20:34
  • How many files are on the share that causes the high memory usage? If it's millions of files then there's no (easy) way to avoid millions of `[IO.DirectoryInfo]` and `[IO.FileInfo]` being created. How much memory does this server have? If 14 GB isn't most/all available memory maybe the garbage collector is electing not to immediately step in to clean up; I suppose you could add `$iteration = 0` before the `foreach ($fileSystemInfo in Get-Childitem ...)` loop and `if (++$iteration % 10000 -eq 0) { [GC]:Collect() }` inside the same loop to prevent too much garbage from accumulating. – Lance U. Matthews Aug 15 '22 at 21:38
  • After modifying my script how @iRon suggested I was able to get the script to run with using minimal memory and it only took about 45 minutes to complete on the largest file share. I found this file share I was stuck on had 6+ million files. The host has 16GB of RAM. I'll test out your garbage collection method, see if that helps. I do appreciate your time and efforts. I've learned so much from this post. – ShaynG Aug 16 '22 at 15:44