2

New to PowerShell, so kind of learning by doing.

The process I have created works, but it ends up locking down my machine until it is completed, eating up all memory. I thought I had this fixed by looking into forcing the garbage collector, and also moving from a for-each statement to using %() to loop through everything.

Quick synopsis of process: Need to merge multiple SharePoint log files into single ones to track usage across all of the companies' different SharePoint sites. PowerShell loops through all log directories on the SP server, and checks each file in the directory if it already exists on my local machine. If it does exist it appends the file text, otherwise it does a straight copy. Rinse-repeat for each file and directory on the SharePoint Log server. Between each loop, I'm forcing the GC because... Well because my basic understanding is the looped variables are held in memory, and I want to flush them. I'm probably looking at this all wrong. So here is the script in question.

$FinFiles = 'F:\Monthly Logging\Logs'

dir -path '\\SP-Log-Server\Log-Directory' | ?{$_.PSISContainer} | %{
    $CurrentDir = $_
    dir $CurrentDir.FullName | ?(-not $_.PSISContainer} | %{
        if($_.Extension -eq ".log"){
            $DestinationFile = $FinFiles + '\' + $_.Name
            if((Test-Path $DestinationFile) -eq $false){
                New-Item -ItemType file -path $DestinationFile -Force
                Copy-Item $_.FullName $DestinationFile
            }
            else{
                $A = Get-Content $_.FullName ; Add-Content $DestinationFile $A
                Write-Host "Log File"$_.FullName"merged."
            }
        [GC]::Collect()
    }
    [GC]::Collect()
}

Granted the completed/appended log files get very very large (min 300 MB, max 1GB). Am I not closing something I should be, or keeping something open in memory? (It is currently sitting at 7.5 of my 8 Gig memory total.)

Thanks in advance.

JHStarner
  • 173
  • 1
  • 2
  • 13
  • You already have 2 good answers with good helpful content. Here I offer you something else, just a comment on your code. While the real solution will probably allow you to remove any use of [GC]::Collect(), I'm pretty sure int the code your provided you only need the first call to [GC]::Collect(), on the inner loop. The outer call to [GC]::Collect() (the second one), is only called either right after the first one or after no files were processed so it's basically not doing anything. – jimhark Sep 21 '15 at 23:29
  • Thanks for the call out. The reason I had the two was one per loop. I didn't know if it would clear out each loops items from memory. But thanks to @TheMadTechnician I was able to get a final product without the garbage collector. – JHStarner Sep 22 '15 at 15:42

2 Answers2

3

Don't nest Get-ChildItem commands like that. Use wildcards instead. Try: dir "\\SP-Log-Server\Log-Directory\*\*.log" instead. That should improve things to start with. Then move this to a ForEach($X in $Y){} loop instead of a ForEach-Object{} loop (what you're using now). I'm betting that takes care of your problem.

So, re-written just off the top of my head:

$FinFiles = 'F:\Monthly Logging\Logs'

ForEach($LogFile in (dir -path '\\SP-Log-Server\Log-Directory\*\*.log')){
    $DestinationFile = $FinFiles + '\' + $LogFile.Name
        if((Test-Path $DestinationFile) -eq $false){
            New-Item -ItemType file -path $DestinationFile -Force
            Copy-Item $LogFile.FullName $DestinationFile
        }
        else{
            $A = Get-Content $LogFile.FullName ; Add-Content $DestinationFile $A
            Write-Host "Log File"$LogFile.FullName"merged."
        }
    }
}

Edit: Oh, right, Alexander Obersht may be quite right as well. You may well benefit from a StreamReader approach as well. At the very least you should use the -readcount argument to Get-Content, and there's no reason to save it as a variable, just pipe it right to the add-content cmdlet.

Get-Content $LogFile.FullName -ReadCount 5000| Add-Content $DestinationFile

To explain my answer a little more, if you use ForEach-Object in the pipeline it keeps everything in memory (regardless of your GC call). Using a ForEach loop does not do this, and should take care of your issue.

TheMadTechnician
  • 34,906
  • 3
  • 42
  • 56
  • I'll give this a shot. I thought I did `ForEach($X in $Y)` to begin with, but I'll double check the previous version when I get back to work tomorrow. Main goal is to be able to still work on my machine whilst this runs. – JHStarner Sep 21 '15 at 21:50
  • This seems to be working best. I to still see an uptick in my memory usage, but it does drop when it finishes a file. So I am still able to work on this machine, whilst the append/concatenate happens. Thanks! – JHStarner Sep 22 '15 at 15:41
  • And I may have spoke too soon. I did just what you put up in the main code block, and it worked until I got to roughly the same spot as last run. Capping at ~7.5GB of my 8GB of ram. So memory is still not clearing after each document is run. – JHStarner Sep 22 '15 at 15:57
  • Third times a charm maybe. I removed the `$A` variable and added the `-ReadCount` you mentioned. It seems to be keeping memory clear this time. No ramp up in use at all. – JHStarner Sep 22 '15 at 16:20
2

You might find this and this helpful.

In short: Add-Content, Get-Content and Out-File are convenient but notoriously slow when you need to deal with large amounts of data or I/O operations. You want to fall back to StreamReader and StreamWriter .NET classes for performance and/or memory usage optimization in cases like yours.

Code sample:

$sInFile = "infile.txt"
$sOutFile = "outfile.txt"

$oStreamReader = New-Object -TypeName System.IO.StreamReader -ArgumentList @($sInFile)
# $true sets append mode.
$oStreamWriter = New-Object -TypeName System.IO.StreamWriter -ArgumentList @($sOutFile, $true)

foreach ($sLine in $oStreamReader.ReadLine()) {
    $oStreamWriter.WriteLine($sLine)
}

$oStreamReader.Close()
$oStreamWriter.Close()
Alexander Obersht
  • 3,215
  • 2
  • 22
  • 26