0

I'm newbie in Powershell. I tried to process / transpose row-column against a medium size csv based record (around 10000 rows). The original CSV consist of around 10000 rows with 3 columns ("Time","Id","IOT") as below:

"Time","Id","IOT" 
"00:03:56","23","26" 
"00:03:56","24","0" 
"00:03:56","25","0" 
"00:03:56","26","1" 
"00:03:56","27","0" 
"00:03:56","28","0" 
"00:03:56","29","0" 
"00:03:56","30","1953" 
"00:03:56","31","22" 
"00:03:56","32","39" 
"00:03:56","33","8" 
"00:03:56","34","5" 
"00:03:56","35","269" 
"00:03:56","36","5" 
"00:03:56","37","0" 
"00:03:56","38","0" 
"00:03:56","39","0" 
"00:03:56","40","1251" 
"00:03:56","41","103" 
"00:03:56","42","0" 
"00:03:56","43","0" 
"00:03:56","44","0" 
"00:03:56","45","0" 
"00:03:56","46","38" 
"00:03:56","47","14" 
"00:03:56","48","0" 
"00:03:56","49","0" 
"00:03:56","2013","0" 
"00:03:56","2378","0" 
"00:03:56","2380","32" 
"00:03:56","2758","0" 
"00:03:56","3127","0" 
"00:03:56","3128","0" 
"00:09:16","23","22" 
"00:09:16","24","0" 
"00:09:16","25","0" 
"00:09:16","26","2" 
"00:09:16","27","0" 
"00:09:16","28","0" 
"00:09:16","29","21" 
"00:09:16","30","48" 
"00:09:16","31","0" 
"00:09:16","32","4" 
"00:09:16","33","4" 
"00:09:16","34","7" 
"00:09:16","35","382" 
"00:09:16","36","12" 
"00:09:16","37","0" 
"00:09:16","38","0" 
"00:09:16","39","0" 
"00:09:16","40","1882" 
"00:09:16","41","42" 
"00:09:16","42","0" 
"00:09:16","43","3" 
"00:09:16","44","0" 
"00:09:16","45","0" 
"00:09:16","46","24" 
"00:09:16","47","22" 
"00:09:16","48","0" 
"00:09:16","49","0" 
"00:09:16","2013","0" 
"00:09:16","2378","0" 
"00:09:16","2380","19" 
"00:09:16","2758","0" 
"00:09:16","3127","0" 
"00:09:16","3128","0" 
... 
... 
... 

I tried to do the transpose using code based from powershell script downloaded from https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be
Basically my powershell code is as below:

$b = @() 
    foreach ($Time in $a.Time | Select -Unique) { 
        $Props = [ordered]@{ Time = $time } 
        foreach ($Id in $a.Id | Select -Unique){ 
            $IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT 
            $Props += @{ $Id = $IOT } 
        } 
        $b += New-Object -TypeName PSObject -Property $Props 
    } 
$b | FT -AutoSize 
$b | Out-GridView 

Above code could give me the result as I expected which are all "Id" values will become column headers while all "Time" values will become unique row and "IOT" values as the intersection from "Id" x "Time" as below:

"Time","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","2013","2378","2380","2758","3127","3128" 
"00:03:56","26","0","0","1","0","0","0","1953","22","39","8","5","269","5","0","0","0","1251","103","0","0","0","0","38","14","0","0","0","0","32","0","0","0" 
"00:09:16","22","0","0","2","0","0","21","48","0","4","4","7","382","12","0","0","0","1882","42","0","3","0","0","24","22","0","0","0","0","19","0","0","0" 

While it only involves a few hundreds rows, the result comes out quickly as expected, but the problem now when processing the whole csv file with 10000 rows, the script above 'keep executing' and doesn't seem able to finish for long time (hours) and couldn't spit out any results. So probably if some powershell experts from stackoverflow could help to asses the code above and probably could help to modify to speed up the results?

Many thanks for the advise

gre_gor
  • 6,669
  • 9
  • 47
  • 52
peace888
  • 3
  • 1
  • 5

1 Answers1

0

10000 records is a lot but I don't think it is enough to advise streamreader* and manually parsing the CSV. The biggest thing going against you though is the following line:

$b += New-Object -TypeName PSObject -Property $Props 

What PowerShell is doing here is making a new array and appending that element to it. This is a very memory intensive operation that you are repeating 1000's of times. Better thing to do in this case is use the pipeline to your advantage.

$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID  | Sort-Object {[int]$_}  -Unique

$data | Group-Object Time | ForEach-Object{
    $props = [ordered]@{Time = $_.Name}
    foreach($header in $headers){
        $props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
    }
    [pscustomobject]$props
} |  export-csv d:\temp\testing.csv -NoTypeInformation

$data will be your entire file in memory as an object. Need to get all the $headers that will be the column headers.

Group the data by each Time. Then inside each time object we get the value for every ID. If the ID does not exist during that time then the entry will show as null.

This is not the best way but should be faster than yours. I ran 10000 records in under a minute (51 second average over 3 passes). Will benchmark to show you if I can.

I just ran your code once with my own data and it took 13 minutes. I think it is safe to say that mine performs faster.


Dummy data was made with this logic FYI

1..100 | %{
 $time = get-date -Format "hh:mm:ss"
 sleep -Seconds 1
    1..100 | % {

        [pscustomobject][ordered]@{
            time = $time 
            id = $_
            iot = Get-Random -Minimum 0 -Maximum 7
        } 
    }
} | Export-Csv d:\temp\data.csv -notypeinformation

* Not a stellar example for your case of streamreader. Just pointing it out to show that it is the better way to read large files. Just need to parse string line by line.

Community
  • 1
  • 1
Matt
  • 45,022
  • 8
  • 78
  • 119
  • Many thanks Matt, the code above indeed speed up much way faster. Actually the csv file i mentioned is only one of the daily type one, I may need to process the same thing with hundreds thousands of rows for the monthly csv files (around 20 mbs size per csv file), do you think the 'streamreader' case will be required ? If it is a must, if you don't mind could you please throw some examples using streamreader technique ? – peace888 Nov 25 '15 at 04:19