I'm a greenhorn in powershell but I have successfully created a script for processing a lot of .csv files. It open csv, look for value on certain line and export that at the finish into new .csv file.
It works but it is very slow (5000 csv takes like 5 hours) so I came up with optimized code - my idea was to read all (5000 csv) files into memory first and then do processing (looking for value and exporting it).
With 10 test .csv files it worked great (11x times faster than original code). But with 5'000 .csv files, it is 6x slower than original code.
I have already increased memory to 32GB for powershell (have 256GB in total) and also increased variable limit from 4096 to 32768 due to the dynamically generated variables.
Do you have any idea why my code works really good with only 10 csv files but is way slower with 5000 csv files?
Original (slow) code was opening csv every time it goes through the loop, the new (fast) code is loading them at once and them in loop, it is just loading csv from memory.
Maybe somebody will give me hint what is generally wrong in my optimization.
my code:
cd e:\temp6
$MaximumVariableCount = 32768
$path = ".\export2"
If(!(test-path $path))
{
New-Item -ItemType Directory -Force -Path $path
}
(get-date).ToString('T')
$data = Import-Csv -Path .\000000.csv -Delimiter ';'
$nodes_number = $data.Length
echo ("number of nodes:"+$nodes_number)
$data = $null
$freq_step = 1
$final_freq = 5000
$StartTime = $(get-date)
#loading csv files into RAM to save time
for ($freq = 0; $freq -le $final_freq; $freq=$freq+$freq_step)
{
$filename = $freq.ToString()+'.csv'
$filename = $filename.PadLeft(10, '0')
$data = Import-Csv -Path $filename -Delimiter ';'
Set-Variable -Name "var$freq" -Value $data -Scope global
$data = $null
}
$elapsedTime = $(get-date) - $StartTime
$totalTime = "{0:HH:mm:ss}" -f ([datetime]$elapsedTime.Ticks)
echo ("load time:"+ $totalTime)
#processing csv from RAM (already stored in RAM)
for ($i = 0; $i -lt $nodes_number; $i++)
{
echo ("finished:"+ [math]::Round(($i/$nodes_number*100),2) + "%")
$OutArray = @()
for ($freq = 0; $freq -le $final_freq; $freq=$freq+$freq_step)
{
#at this step I was reading from csv from harddrive but using my optimization
# I'm now reading directly from RAM memory
$data = Get-Variable -Name "var$freq" -ValueOnly
$myobj = "" | Select "frequency", "amplitude"
$myobj.frequency = $freq
$myobj.amplitude = $data[$i].'Pressure [MPa]'
$outarray += $myobj
$myobj = $null
}
$exportname = $i.ToString()+'r.csv'
$outarray | export-csv .\export2\$exportname -NoTypeInformation -Delimiter ';'
$OutArray = $null
}
$elapsedTime = $(get-date) - $StartTime
$totalTime = "{0:HH:mm:ss}" -f ([datetime]$elapsedTime.Ticks)
echo ("total time ccc:"+ $totalTime)
echo ("total time ccc:"+ $totalTime) | Out-File .\info.txt -Append