5

I have a small powershell script which reads a document with UTF8 encoding, makes some replacements in it and saves it back which looks like this:

(Get-Content $path) -Replace "myregex","replacement" | Set-Content $path2 -Encoding utf8

This will create a new file with the right encoding and right contents but there are additional new line characters at the end. According to this answer and many others, I am told to either:

  1. Add the parameter -NoNewLine to Set-Content
  2. Use [System.IO.File]::WriteAllText($path2,$content,[System.Text.Encoding]::UTF8)

Both solutions remove the trailing new lines... and every other new lines in the file.

Is there a way to both:

  1. Remove the trailing new lines while saving the file.
  2. Keep the existing new lines in my file.
Community
  • 1
  • 1
Marcel Gosselin
  • 4,610
  • 2
  • 31
  • 54

2 Answers2

7

[IO.File]::WriteAllText() assumes that $content is a single string, but Get-Content produces an array of strings (and removes the line breaks from the end of each line/string). Mangling that string array into a single string joins the strings using the $OFS character (see here).

To avoid this behavior you need to ensure that $content already is a single string when it's passed to WriteAllText(). There are various ways to do that, for instance:

  • Use Get-Content -Raw (PowerShell v3 or newer):

    $content = (Get-Content $path -Raw) -replace 'myregex', 'replacement'
    
  • Pipe the output through Out-String:

    $content = (Get-Content $path | Out-String) -replace 'myregex', 'replacement' -replace '\r\n$'
    

    Note, however, that Out-String (just like Set-Content) adds a trailing line break, as was pointed out in the comments. You need to remove that with a second replacement operation.

  • Join the array with the -join operator:

    $content = (Get-Content $path) -replace 'myregex', 'replacement' -join "`r`n"
    
Community
  • 1
  • 1
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • Thanks for the explanation and different options to handle that. The `-Raw` parameter was the answer for me. I didn't want to force a type of new line with `-join` operator but rather keep the same from the original file. When I tested the `Out-String` option, I still had new line at the end of the file. I wonder if I did something wrong... – Marcel Gosselin May 02 '17 at 17:07
  • 1
    @MarcelGosselin: Regrettably, `Out-String` _always_ appends a trailing newline, and doesn't support `-NoNewline` as of PSv5.1: `(Out-String -InputObject 'a') -match '^a\r?\n$'` returns `$true`. I've created [a GitHub issue](https://github.com/PowerShell/PowerShell/issues/3684) to suggest adding `-NoNewline` to `Out-String`. – mklement0 May 02 '17 at 17:55
  • My mistake. Totally forgot about that. – Ansgar Wiechers May 02 '17 at 19:13
7

To complement Ansgar Wiechers' helpful answer:

Using Set-Content -NoNewline (PSv5+) is an option, but only if you pass the output as a single string with embedded newlines, which Get-Content -Raw can do:

(Get-Content -Raw $path) -replace 'myregex', 'replacement' |  
  Set-Content -NoNewline $path2 -Encoding utf8

Note, however, that the semantics of -replace change with the use of -Raw: now a single
-replace operation is performed on a multi-line string (the entire file contents) - as opposed to line-individual operations with an array as the LHS.

Also note that -Raw will preserve the trailing-newline-or-not status of the input.

If you want the line-by-line semantics and/or want to ensure that the output's final line has no trailing newline (even if the input file had one), use Get-Content without -Raw, and then -join:

(Get-Content $path) -replace 'myregex', 'replacement' -join [Environment]::NewLine |  
  Set-Content -NoNewline $path2 -Encoding utf8

The above uses the platform-appropriate newline character(s) on output, but note that there's no guarantee that the input file used the same.


As for what you tried:

As you've observed, Set-Content -NoNewline with an array of strings causes all strings to be concatenated without a separator - unlike what one might expect, -NoNewline doesn't just omit a trailing newline:

 > 'one', 'two' | Set-Content -NoNewline t.txt; Get-Content -Raw t.txt
 onetwo  # Strings were directly concatenated.

Note: Newlines embedded in input strings are preserved, however.

The reason for the [IO.File]::WriteAllText() approach not resulting in any newlines is different, as explained in Ansgar's answer.

mklement0
  • 382,024
  • 64
  • 607
  • 775