0

This seems simple enough but cannot get anything I try to work. I am trying to remove the CRLF from the ends of lines that don't meet my criteria, then output the file to an new file. For example this section:

One~Two~Three~Four
Test Plan Pay Work~scheduled payment pending~79f1cf6e~3/8/2020 6:13:07 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM
Test Plan Pay Work~GetCardInfo 
{failed to validate card
}
~f124a822-aa8d-4624-bb8c-ddsfgdfcc21fb~3/8/2020 6:14:31 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM

Should output to look like:

One~Two~Three~Four
Test Plan Pay Work~scheduled payment pending~79f1cf6e~3/8/2020 6:13:07 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM
Test Plan Pay Work~GetCardInfo {failed to validate card}~f124a822~3/8/2020 6:14:31 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM

Being a newbie I have tried:

Get-Content "C:\temp\errors.csv" | ForEach-Object {
  if ((!$_.EndsWith("AM") -and !$_.EndsWith("PM") -and !$_.EndsWith("Four")))
    {
       $_ -replace ("`r`n",' ')
    }
} | Out-File C:\temp\errors2.csv

But this does not work. Any ideas on this? Seems simple but cannot get it to work whatever I try.

1 Answers1

2

Get-Content splits text into separate lines and removes the newline characters by default. To prevent that use parameter -Raw. Now you can process the text as a whole, using regular expression -replace operator:

(Get-Content 'errors.csv' -Raw) -replace '(?<!AM|PM|Four)\r\n', ' ' | 
    Out-File 'errors2.csv'

The parentheses around the Get-Content call allow the use of the output of the command directly as the left-hand side operand of the -replace operator (see Grouping Operator).

Output:

One~Two~Three~Four
Test Plan Pay Work~scheduled payment pending~79f1cf6e~3/8/2020 6:13:07 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM
Test Plan Pay Work~GetCardInfo  {failed to validate card } ~f124a822-aa8d-4624-bb8c-ddsfgdfcc21fb~3/8/2020 6:14:31 PM
Test Plan Pay Work~Bad Request~680a0bb2~3/8/2020 6:14:00 AM

Regular expression breakdown:

  • (?<! starts a negative lookbehind assertion
    • AM|PM|Four any of literal AM, PM or Four
  • ) ends the negative lookbehind assertion
  • \r\n linefeed characters

The negative lookbehind assertion makes the RegEx match only if the linefeed characters are not preceeded by AM, PM or Four. The negative lookbehind doesn't take part of the match result, so only the linefeed characters will be replaced.

Regex lookahead, lookbehind and atomic groups

Note:

This approach, using Get-Content -Raw loads the entire file into memory. If the file is too big an approach using default Get-Content, to process input line by line (possibly in chunks using parameter -ReadCount) would be feasible, but a bit more complicated.

zett42
  • 25,437
  • 3
  • 35
  • 72