1

This is a duplicate of How to exit from ForEach-Object in PowerShell - I'm not sure how to mark that, but I'm asking the question again because there is a ton of helpful information in the answers to that question including ways to handle it. However, the accepted answer from 2012 doesn't seem correct and the nuggets of wisdom are buried and shouldn't be.

So my script is looping through all of the CSV files in a directory and then taking action on those that contain records with the wrong number of columns. When I first ran this, it took a long time and I realized that some of the CSVs actually have the correct number of columns for all rows and I didn't need to take action on them, so I decided to try and implement something to check for at least one row with the incorrect number of columns and then take some kind of action on the file. To do so, I needed to kick out of my ForEach-Object loop once I found a qualifying row.

Here is the original script:

$ParentPath = "C:\Users\<username>\Documents\Temporary\*"
$Files = Get-ChildItem -Path $ParentPath -Include *.csv
foreach ($File in $Files) {
    $OldPath = $File | % { $_.FullName }
    $OldContent = Get-Content $OldPath
    $NewContent = $OldContent |
    Where-Object {$_.split(",").Count -eq 11}
    Set-Content -Path "$(Split-Path $OldPath)\$($File.BaseName).txt" -Value $NewContent
    }

Implementing some type of 'pre-check' on the files proved difficult, even though I was able to use an answer from SO to properly write-out the 'bad lines' - I was unable to quit out of the ForEach-Object loop without processing all rows (thereby defeating the entire purpose of pre-checking the files for offending rows).

Code below works great at identifying the first bad row, and if you remove the ;break then it'll write out every offending row:

Get-Content $OldPath | ?{$_} | %{if(!($_.split(",").Count -eq 11)){"Process stopped at line number $($_.ReadCount), incorrect column count of: $($_.split(",").Count).";break}}

So how do I combine the two scripts to pre-check files for an offending row, and then take action on the files that need it? See proposed answer below!

immobile2
  • 489
  • 2
  • 15

3 Answers3

1

Tl;dr - here's my updated script:

$ParentPath = "C:\Users\<username>\Documents\Temporary\*"
$Files = Get-ChildItem -Path $ParentPath -Include *.csv
foreach ($File in $Files){
    $OldPath = $File | %{$_.FullName}
    Get-Content $OldPath | ?{!($_.split(",").Count -eq 11)} | Select -First 1 | %{[bool]$NeedsFix = 1}
    If($NeedsFix){Get-Content $OldPath | ?{($_.split(",").Count -eq 11)} | Set-Content -Path "$(Split-Path $OldPath)\$($File.BaseName).txt"}
    $NeedsFix = 0
}

To get here, let's summarize a few answers from the related question:

  • Stoffi does an incredible job of outlining the different outputs to expect from using break, continue, and return in a ForEach-Object loop and the ForEach method. Also see MS-ScriptingGuy - but this doesn't solve my conundrum
  • The first viable answer comes in the form of throwing and catching an exception. Those terms are currently foreign to me, but you might dig it!
  • Another viable option comes in the form of limiting your loops with a Where-Object, pretty slick! (See answers from ThePennyDrops, Rikki, Eddi Kumar if that floats your boat)
  • Alex Hague's solution I haven't tested, but using Labels with the break keyword seems like it might work for both ForEach-Object loops and ForEach loops
  • The pièce de résistance what if you just tell your pipeline to stop the ForEach-Object loop once it has selected the first instance you're looking for? Sound too simple? Maybe...but it seems to work!
    • Credit here goes to @zett42's incredibly simple solution that has far too few up votes, is far too easy to implement into the pipeline, and has the potential to be far more flexible than many of the other answers/solutions.
      • Need the check for 6 occurrences of problem rows before quitting the loop? How about using -First 6? Sure beats creating a counter variable
immobile2
  • 489
  • 2
  • 15
1

Here's a PowerShell-idiomatic reformulation of your solution (PSv4+) that should perform much better.

Note, however, that it assumes that each .csv file fits into memory as a whole:

Get-ChildItem -LiteralPath $HOME\Documents\Temporary -Filter *.csv | ForEach-Object {
  $okRows, $brokenRows = (Get-Content -ReadCount 0 -LiteralPath $_.FullName).
                           Where({ $_.split(",").Count -eq 11 }, 'Split')
  if ($brokenRows) {
    Set-Content -LiteralPath "$($_.DirectoryName)\$($_.BaseName).txt" -Value $okRows
  } 
}

To address the question implied by your question's title:

  • Unfortunately, as of PowerShell Core 7.2 there is still no direct way to prematurely stop a pipeline on demand:

    • Select-Object -First is capable of that, but it uses a non-public exception type, so the behavior is limited to exiting the pipeline after the first N input objects.

    • GitHub issue #3821 is a long-standing feature request to make this capability generally available.

  • While the break statement is not directly suitable for exiting a pipeline on demand - it looks for an enclosing loop statement to break out of, on the entire call stack, and, if it finds none, terminates execution overall - you can make it work with a dummy loop.

    • See this answer for more information.

    • Quick example:

      # Use of the dummy `do { ... } while ($false)` loop enables `break`
      # to exit the pipeline.
      do { 1..10 | ForEach-Object { $i=0 } { $_; if (++$i -eq 2) { break } } } while ($false)
      
  • The caveat with both Select-Object -First and break + dummy loop is that they skip the end block (for cmdlets written in PowerShell) / EndProcessing() method (for binary cmdlets) of all upstream cmdlets, which may cause problems.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • I _usually_ don't like it when an answer has the Powershell-idomatic way of doing something, but in this case - kudos and thank you! It is indeed faster and actually runs out of the box which isn't always the case with answers. It'll take me a little bit of time to unpack it all and figure out why it is faster, but the -Filter vs. -Include is definitely a leg up. If I am loading it into memory, do you happen to know if an even faster option might be to replace the ForEach-Object with a Foreach loop? Drawing on something I read [here](https://powershell.one/tricks/performance/pipeline) – immobile2 Aug 18 '21 at 13:27
  • Re performance: `-Filter` indeed speeds things up (it delegates the filtering to the file-system), though with a single directory that won't matter much. Yes, using a `foreach` statement is faster than `ForEach-Object`, though with a cmdlet call (which has to stream its results, which the engine must collect before starting the enumeration), the speed-up isn't dramatic. See the bottom section of [this answer](https://stackoverflow.com/a/48888108/45375) for a performance comparision of member enumeration, the `foreach` statement, the `ForEach-Object` cmdlet, and PSv4+ `.ForEach()` array method. – mklement0 Aug 18 '21 at 14:07
  • RE Idiomatic: Some people will post an answer or comment that simply replaces works like `ForEach-Object{}` with `%{}` and then say _here's a real idiomatic version_. In that case, little value add. As for -Filter/-Include, the difference is milliseconds, but 50% better give or take even if you're just using Get-ChildItem to delete all of the .txts and restart with a fresh set of files – immobile2 Aug 18 '21 at 15:10
  • Understood, @immobile2. As for `-Include`: note that with a single wildcard pattern `Get-ChildItem -LiteralPath $HOME\Documents\Temporary\*.csv` works too (though `-Filter` is still fastest). A potential gotcha re `-Filter`: it uses the wildcard language of the platform's API, not PowerShell's, which means that character sets and ranges (`[...]`) aren't supported; additionally, on Windows `-Filter` has many legacy quirks. – mklement0 Aug 18 '21 at 15:13
  • Could you possibly help explain a few things that I haven't seen before or used? I have the $okRows and $brokenRows figured out, but the code after `$brokenRows =` is a bit different. Are the parentheses required before `Get-Content`? and is the `).Where(` functioning the same as `Where-Object`/`?{}` would function? If so, why call it with a dot vs. the pipeline? – immobile2 Aug 18 '21 at 15:16
  • @immobile2: [`.Where()` is a _method_](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Arrays#where) and therefore operates in _expression mode_ and requires the collection it operates on to be in memory in full already, hence the need for `(...)`. `.Where()` is faster than the `Where-Object` _cmdlet_ (the pipeline is always slower), and `.Where()` has additional features that I _wish_ were available in the cmdlet as well - see [GitHub issue #13834](https://github.com/PowerShell/PowerShell/issues/13834). – mklement0 Aug 18 '21 at 16:48
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/236162/discussion-between-immobile2-and-mklement0). – immobile2 Aug 18 '21 at 21:10
0

One can exit an foreach-object "loop" by the following construction:

$objects = "Brakes","Wheels","Windows"
$Break = $False
$objects | Where-Object { $Break -eq $False } | ForEach-Object {
 $Break = $_ -eq "Wheels";
 Write-Output "The car has $_.";
}

This is not invented by myself, but found on https://linuxhint.com/how-to-exit-from-foreach-object-in-powershell/