0

I am working on this bit of code that helps clean up a larger csv file it takes 7 seconds when measuring the Foreach statement.. how do I speed this up.

Add-Type -AssemblyName System.Windows.Forms
$FileBrowser = New-Object System.Windows.Forms.OpenFileDialog -Property @{ InitialDirectory = [Environment]::GetFolderPath('Desktop') }
$null = $FileBrowser.ShowDialog()
$FilePath = $FileBrowser.FileName
$export = import-csv $FilePath
$exportFinalHeading = "Id,Type,Name,SpaceId,SpaceName,NetworkSwitchName,NetworkSwitchPortID,ExternalID,SerialNumber"
Add-Content -Path ".\ExportFinal.csv" -Value $exportFinalHeading
Measure-Command {ForEach ($export in $export){
   $id = $($export.id) + ","
   $Type = $($export.Type) + ","
   $Name = $($export.Name) + ","
   $SpaceId = $($export.SpaceId) + ","
   $SpaceName = $($export.SpaceName) + ","
   $NetworkSwitchName = $($export.NetworkSwitchName) + ","
   $NetworkSwitchPortID = $($export.NetworkSwitchPortId) + ","
   $ExternalID = $($export.ExternalId) + ","
   $ExternalID = $ExternalID.replace(':',',')
   $exportFinal = $id + $Type + $Name + $SpaceId + $SpaceName + $NetworkSwitchName + $NetworkSwitchPortID + $ExternalID

   Add-Content -Path ".\ExportFinal.csv" -Value $exportFinal
}}

I have tried measuring just a straight pass thru of the text write and re write but the goal was to complete pull specific data out

jdweng
  • 33,250
  • 2
  • 15
  • 20
  • Why are you creating a CSV manually instead of using `Export-Csv` ? – Santiago Squarzon Mar 21 '23 at 02:00
  • I have a csv that I am passing in pulling specific columns out of and then attempting to pass that information out – Broc Christensen Mar 21 '23 at 02:02
  • 1
    Yes I know, still, why are you not using `Export-Csv` ? This is perfectly doable with it and will do it much more efficient than what you currently. If you want help with how to do it with the cmdlet then provide an example of the input CSV and the output you're expecting – Santiago Squarzon Mar 21 '23 at 02:05
  • 2
    In any case the biggest problem is using `Add-Content` in every loop iteration instead of streaming the output through the pipeline to `Set-Content` – Santiago Squarzon Mar 21 '23 at 02:23
  • Or even save the whole thing to a variable first if you have the memory. – js2010 Mar 21 '23 at 03:40

2 Answers2

0

The reason, as Santiago Squarzon pointed out, is that file IO in a loop is expensive. When Add-Content is called in a loop, it invokes several things. One acquires a lock on the file so that no-one else is going to mess wit its contents at the same time. Then content is written, filesystem metadata is updated (file size has changed, so has last access time and last writetime), and finally the lock is released. Also, an antivirus is likely intercepting the operation to make sure there isn't any malware monkey business going on. All these add up just a bit. Repeat that 10 000 times and the little extra things increase ten thousand fold too.

Here's an example from Powershell 7.3.3. on MacOS. The first one uses a StringBuilder to store the data, then writes all the content in one operation. StringBuilder is a data structure that support mutable content and its size is easily changed. That's quite unlike like the ordinary String object in .Net - Strings are immutable. The second one is akin your approach, it writes each value to the file in the loop.

The execution times are easily tested with test statemenst like so,

$max=somevalue
measure-command {
  $sb=new-object text.stringbuilder
  foreach ($i in 0..$max) {$sb.appendline($i)|out-null }
  set-content -Path buffered.txt -Value $sb.ToString() 
}

measure-command {
  foreach ($i in 0..$max) {add-content -path rowbyrow.txt -value $i }
}                                                                              

For $max=10000, $max=20000, $max=50000 and $max=100000, the values are

TotalMilliseconds :   119,6241
TotalMilliseconds :  5201,4771

TotalMilliseconds :   199,1188
TotalMilliseconds : 10567,1563

TotalMilliseconds :   431,2403
TotalMilliseconds : 27068,5792

TotalMilliseconds :   848,9176
TotalMilliseconds : 55165,0826

As you can see, the execution times scale in linear fashion. In the beginning, the difference is "only" one order of magnitude. On 50 000 rows, the the row-by-row version is two orders of magnitude more slow.

vonPryz
  • 22,996
  • 7
  • 54
  • 65
0

To compliment @vonPryz (which is probably fastest) with how you should use the PowerShell Pipeline what is actually mentioned by @Santiago.

Knowing that there are a few reasons why you might want to use the PowerShell pipeline:

  • Quote: Many of the techniques described in PowerShell scripting performance considerations aren't idiomatic PowerShell and may reduce the readability of a PowerShell script. Script authors are advised to use idiomatic PowerShell unless performance dictates otherwise.
  • The PowerShell Pipeline cmdlets take care of initializing and cleaning up process items
  • Solutions that (correctly) use the PowerShell Pipeline use less memory

As stated in the PowerShell scripting performance considerations document, avoid wrapping cmdlet pipelines. This mean that the Add-Content should go outside the foreach loop:

$(ForEach ($export in $export){ ... }) |Set-Content ".\ExportFinal.csv"

Notes:

  • In this example, I am not using Add-Content but Set-Content as the cmdlet takes care starting a new file once (and not each iteration), meaning that I do not have to take care of cleaning the file upfront.
  • To collect your language statement's output up front before sending it to the pipeline (still object by object, enclose it in $(...), the subexpression operator (see also this answer from mklement0)

But that will still use a lot of memory, therefore you should use the Foreach-Object cmdlet instead (even it has a little more overhead than the foreach statement):

$export |Foreach-Object { ... } |Set-Content ".\ExportFinal.csv"

Or even better (One-at-a-time process):

Import-Csv $FilePath |Foreach-Object { ... } |Set-Content ".\ExportFinal.csv"

Benchmark

$max = 10000

measure-command { # 7550 milliseconds
    Clear-Content -path .\Test1.txt
    foreach ($i in 0..$max) { Add-Content -path .\Test1.txt -value $i }
} 

measure-command { # 98 milliseconds
     $(foreach ($i in 0..$max) { $i }) |Set-Content -path .\Test2.txt
} 

measure-command { # 143 milliseconds
     0..$max |Foreach-Object { $_ } |Set-Content -path .\Test3.txt
} 

If you find yourself in a situation were you want to output to e.g. multiple files is a single loop, you might read this: Mastering the (steppable) pipeline for background and a quick PowerShell solution.

iRon
  • 20,463
  • 10
  • 53
  • 79