0

I'm trying to create a PowerShell script that transpose a CSV file from column to rows.

I found examples of doing the opposite (converting row based CSV to column) but I found nothing on column to rows. My problem being that I don't know exactly how many column I'll have. I tried adapting the row to column to column to rows but unsuccessfully.

$a = Import-Csv "input.csv"
$a | FT -AutoSize

$b = @()
foreach ($Property in $a.Property | Select -Unique) {
    $Props = [ordered]@{ Property = $Property }
    foreach ($Server in $a.Server | Select -Unique){ 
        $Value = ($a.where({ $_.Server -eq $Server -and 
                    $_.Property -eq $Property })).Value
        $Props += @{ $Server = $Value }
    }
    $b += New-Object -TypeName PSObject -Property $Props
}

$b | FT -AutoSize
$b | Out-GridView
$b | Export-Csv "output.csv" -NoTypeInformation 

For example my CSV can look like this:

"ID","DATA1"
"12345","11111"
"54321","11111"
"23456","44444"

or this (number of column can vary):

"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555",

and I would like the script to convert it like this:

"ID","DATA"
"12345","11111"
"12345","22222"
"12345","33333"
"54321","11111"
"23456","44444"
"23456","55555"
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
Francois A
  • 3
  • 1
  • 3
  • Hmmm...so what you are asking for is not exactly what you say you want. Converting the columns to rows would not give the output that you show. What you want is to convert one row per ID to multiple rows per ID. There are a few ways to go about this. Will the ID ***always*** be the first coulmn? – EBGreen Jun 12 '18 at 15:11
  • Yes, it will always be the first column – Francois A Jun 12 '18 at 15:16
  • Also, Is there any chance at all that the data will ever have a comma in a field? – EBGreen Jun 12 '18 at 15:21
  • No, there won't be any comma in a field. – Francois A Jun 12 '18 at 15:23

4 Answers4

1

The trick is to query the members of the table to get the column names. Once you do that then the rest is straightforward:

function Flip-Table ($Table) {

    Process {

        $Row = $_

        # Get all the columns names, excluding the ID field.
        $Columns = ($Row | Get-Member -Type NoteProperty | Where-Object Name -ne ID).Name

        foreach ($Column in $Columns) {

            if ($Row.$Column) {

                $Properties = [Ordered] @{
                    "ID"   = $Row.ID
                    "DATA" = $Row.$Column
                }

                New-Object PSObject -Property $Properties
            }
        }

        # Garbage collection won't kick in until the end of the script, so
        # invoke it every 100 input rows.

        $Count++;

        if (($Count % 100) -eq 0) {
            [System.GC]::GetTotalMemory('forceFullCollection') | out-null
        }
    }
}

Import-Csv input.csv | Flip-Table | Export-Csv -NoTypeInformation output.csv
Don Cruickshank
  • 5,641
  • 6
  • 48
  • 48
  • Thanks again, this works perfectly. But I do have one of the files I process with this that is very big, resulting in a huge RAM usage by the script (actually had to kill some other process to avoid the script crashing). Is it possible to adapt this to write some of the lines in the output file then free some memory before resuming. I guess this would slow down the process but avoid needing high amount of memory when working with huge files and risking a crash. – Francois A Jun 13 '18 at 14:15
  • @FrancoisA I've changed the function so that it streams the data. That should, as far as I know, fix the RAM issue. Note that it's called in a different way in the last line now. – Don Cruickshank Jun 13 '18 at 18:25
  • Unfortunately even with that new version the high memory usage issue persisted Exception of type 'System.OutOfMemoryException' was thrown. At C:\Script\transposeRowsGroups.ps1:19 char:17 + if ($Row.$Column) { + + CategoryInfo : OperationStopped: (:) [], OutOfMemoryException + FullyQualifiedErrorID : System.OutOfMemoryException – Francois A Jun 15 '18 at 20:12
  • OK - I've just read up on garbage collection in PowerShell and added a manual garbage collection step. I'm hoping that should sort out the memory allocation problem. – Don Cruickshank Jun 15 '18 at 22:20
  • 1
    Thanks a lot. Ok, it's doesn't work. But it is because there is an issue with the garbage collector when using a pipe call, I had to change `[System.GC]::Collect()` to `[System.GC]::GetTotalMemory(‘forcefullcollection’) | out-null` so now it works. Thanks for your help ! [GC Issue](https://stackoverflow.com/questions/31620763/no-garbage-collection-while-powershell-pipeline-is-executing) [Work-Around](http://www.jhouseconsulting.com/2017/09/25/addressing-the-powershell-garbage-collection-bug-1825) – Francois A Jun 19 '18 at 18:45
  • @FrancoisA Thanks for the feedback. I've edited that into the answer. – Don Cruickshank Jun 19 '18 at 19:03
1

Well, here is mine. I'm not as fancy as the rest:

$in = Get-Content input.csv | Select -Skip 1
$out = New-Object System.Collections.ArrayList
foreach($row in $in){
    $parts = $row.Split(',')
    $id = $parts[0]
    foreach($data in $parts[1..$parts.Count]){
        if($data -ne '' -AND $data -ne $null){
            $temp = New-Object PSCustomObject -Property @{'ID' = $id;
                                                        'Data' = $data}
            $out.Add($temp) | Out-Null
        }
    }

}
$out | Export-CSV output.csv -NoTypeInformation
EBGreen
  • 36,735
  • 12
  • 65
  • 85
0

You can do something like this

# Convert csv to object
$csv = ConvertFrom-Csv @"
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555"
"@

# Ignore common members and the ID property
$excludedMembers = @(
    'GetHashCode',
    'GetType',
    'ToString',
    'Equals',
    'ID'
)

$results = @()

# Iterate around each csv row
foreach ($row in $csv) {
    $members =  $row | Get-Member
    # Iterate around each member from the 'row object' apart from our 
    # exclusions and empty values
    foreach ($member in $members | 
        Where { $excludedMembers -notcontains $_.Name -and $row.($_.Name)}) {
        # add to array of objects
        $results += @{ ID=$row.ID; DATA=$row.($member.Name)}
    }
}

# Write the csv string
$outstring = "ID,DATA"
$results | foreach { $outstring += "`n$($_.ID),$($_.DATA)" }

# New csv object
$csv = $outstring | ConvertFrom-Csv

Probably not the most elegant solution, but should do what you need

I left some comments explaining what it does

Bassie
  • 9,529
  • 8
  • 68
  • 159
0

If you only want to accept a limited number DATA columns (e.g. 5), you could do:

ForEach ($i in 1..5) {$CSV | ? {$_."Data$i"} | Select ID, @{N='Data'; E={$_."Data$i"}}}

And if you have a potential unlimited number of DATA columns:

ForEach ($Data in ($CSV | Select "Data*" -First 1).PSObject.Properties.Name) {
    $CSV | ? {$_.$Data} | Select ID, @{N='Data'; E={$_.$Data}}
}
iRon
  • 20,463
  • 10
  • 53
  • 79