2

I'm working on 'fixing' data in CSVs and using PowerShell because the number of rows exceeds Excel's liking and I usually also have multiple CSVs in a folder that I need to make the same updates to. I'm having trouble finding a concise way checking various columns for data and then making updates as needed. I can do it with multiple pipelines/commands, but I'd like to do it all at once and I'm also wary of doing something like $csvData = Import-CSV... because there are A LOT of rows.

Below are the 3 commands I'm using to start with, any ideas on how to run them all at once?

Import-CSV $Filepath | %{if([string]::IsNullOrEmpty($_."Type of Admission")){$_."Type of Admission" = "9"} ; $_} | Export-CSV $NewFilepath -UseQuotes AsNeeded
Import-CSV $Filepath | %{if([string]::IsNullOrEmpty($_."Service from Dates") -And !([string]::IsNullOrEmpty($_."Service to Dates"))){$_."Service from Dates" = $_."Service to Dates"} ; $_} | Export-CSV $NewFilepath -UseQuotes AsNeeded
Import-CSV $Filepath | %{if(($_."Financial Class" -eq "Self-Pay")){$_."Payer" = $_."Health Plan Identification Number" = $_."Financial Class"} ; $_} | Export-CSV $NewFilepath -UseQuotes AsNeeded

One thing I'm trying to try is using the Switch cmdlet instead of a super nested If function. My idea with Switch was kind of like using a "With" statement in VBA and a Select Case that performs all true cases. So instead of Switch($_.FirstProperty) or even If($_.FirstProperty) could I just say Switch($_) and then reference each property/column as .FirstProperty? Alas I don't think it will work like that - but if it did, then I suppose my code would be something like below:

Import-CSV $Filepath | %{
    Switch($_) {
        [string]::IsNullOrEmpty($_."Type of Admission"){$_."Type of Admission" = "9"}
        [string]::IsNullOrEmpty($_."Service from Dates") -And !([string]::IsNullOrEmpty($_."Service to Dates")){$_."Service from Dates" = $_."Service to Dates"}
        ($_."Financial Class" -eq "Self-Pay"){$_."Payer" = $_."Health Plan Identification Number" = $_."Financial Class"}
    }
Export-CSV $NewFilepath -UseQuotes AsNeeded}

EDIT: For anyone planning to use Cpt.Whale's suggestion of keeping the data in memory instead of writing to disk each line; it works similar as you'd expect, but a little bit special with CSV data vs. your standard Get-Content/Set-Content workflow. Specifically the variable like $report is going to need special characteristics. My recommendation at the current moment would be to use $outFile = New-Object System.Collections.ArrayList and the .Add method to populate as you go. Some will say this is bad, but this question probably won't get too high on the list and it's fast and works.

immobile2
  • 489
  • 2
  • 15

1 Answers1

1

First, there are ways to read a file one line at a time, but they are usually unnecessary unless you're running out of memory, like gigabyte-plus-sized csv files. If that's the case, then I believe you want to use System.IO.File accelerators like so instead of Import-CSV:

foreach($row in [System.IO.File]::ReadLines($Filepath)) {
    # Do Stuff to $row
}

There are more examples in the answers to this question, but you'll want to be careful of file stream locks, which can be a pain.

In addition, a huge time save can be made by processing all of your if statements at the same time, instead of iterating through the entire file 3 times. For example:

# Import once if you have the memory.
$data = Import-CSV $Filepath

# Iterate once
Foreach ($row in $data) {
    if ( [string]::IsNullOrEmpty($row."Type of Admission") ) { 
        $row."Type of Admission" = "9" 
    }
    if ( [string]::IsNullOrEmpty($row."Service from Dates") -And -Not ([string]::IsNullOrEmpty($row."Service to Dates")) ) {
        $row."Service from Dates" = $row."Service to Dates"
    }
    if ( $row."Financial Class" -eq "Self-Pay" ) {
        $row."Payer" = $row."Health Plan Identification Number" = $row."Financial Class"
    }
    
    # export finalized row
    $row | Export-CSV $NewFilepath -UseQuotes AsNeeded
}

To your last question about Switch, it only compares single values, and not objects. It's great for many things, but not here unless you have many more statements to add. It's better suited to handling one column of data like if 'Type of Admission' is 1 then A; 2 then B; etc.

Cpt.Whale
  • 4,784
  • 1
  • 10
  • 16
  • Thanks! I plan to run some speed tests on a few different scripts to see how they perform, but I think your answer on processing all of the `if` statements is what I was trying to get to. I had originally shied away from `ForEach` because I thought I wanted to keep it in the pipeline, but on a larger file I'll need to likely load it sometime into memory and then go with it – immobile2 Aug 18 '21 at 01:41
  • @immobile2 If you are primarily looking for speed, I'll recommend something like `$report = Foreach ($row in $data) { # do stuff; Write-Output $row }` then end with `$report | Export-CSV ...` to keep the output in memory as well instead of writing to disk on each row. It's less memory efficient, but worthwhile for speed. – Cpt.Whale Aug 18 '21 at 13:47
  • Thanks Cpt! Definitely helpful and with a bit of tweaking, pretty darn fast. My number of conditions grows each day, so plenty of If blocks to go around – immobile2 Aug 21 '21 at 02:06