3

I have the following powershell:

# Find all .csproj files 
$csProjFiles = get-childitem ./ -include *.csproj -recurse 

# Remove the packages.config include from the csproj files.
$csProjFiles | foreach ($_) {(get-content $_) | 
             select-string -pattern '<None Include="packages.config" />' -notmatch | 
             Out-File $_ -force}

And it seems to work fine. The line with the packages.config is not in the file after I run.

But after I run there is an extra newline at that TOP of the file. (Not the bottom.)

I am confused as to how that is getting there. What can I do to get rid of the extra newline char that this generates at the top of the file?

UPDATE:

I swapped out to a different way of doing this:

$csProjFiles | foreach ($_) {$currentFile = $_; (get-content $_) | 
               Where-Object {$_ -notmatch '<None Include="packages.config" />'} | 
               Set-Content $currentFile -force}

It works fine and does not have the extra line at the top of the file. But I wouldn't mind knowing why the top example was adding the extra line.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Vaccano
  • 78,325
  • 149
  • 468
  • 850
  • 1
    IMHO `foreach ($_) {(get-content $_)` should/could be `foreach {get-content $_ |...` –  Feb 25 '17 at 00:09

1 Answers1

3
  • Out-File and redirection operators > / >> take arbitrary input objects and convert them to string representations as they would present in the console - that is, PowerShell's default output formatting is applied - and sends those string representations to the output file.
    These string representations often have leading and/or trailing newlines for readability.

  • Set-Content is for input objects that are already strings or should be treated as strings.

    • PowerShell calls .psobject.ToString() on all input objects to obtain the string representation, which in most cases defers to the underlying .NET type's .ToString() method.

The resulting representations are typically not the same, and it's important to know when to choose which cmdlet / operator.

Additionally, the default character encodings differ:

  • Out-File and > / >> default to UTF-16 LE, which PowerShell calls Unicode in the context of the optional -Encoding parameter.

  • Set-Content defaults to your system's legacy "ANSI" code page (a single-byte, extended-ASCII code page), which PowerShell calls Default.

    • Note that the the docs as of PSv5.1 mistakenly claim that the default is ASCII.[1]

To change the encoding:

  • Ad-hoc change: Use the -Encoding parameter with Out-File or Set-Content to control the output character encoding explicitly.
    You cannot change the encoding used by > / >> ad-hoc, but see below.

  • [PSv3+] Changing the default (use with caution): Use the $PSDefaultParameterValues mechanism (see Get-Help about_Parameters_DefaultValues), which enables setting default values for parameters:

    • Changing the default encoding for Out-File also changes it for > / >> in PSv5.1 or above[2].
      To change it to UTF-8, for instance, use:
      $PSDefaultParameterValues['Out-File:Encoding']='UTF8'

    • Note that in PSv5.0 or below you cannot change what encoding > and >> use.

    • If you change the default for Set-Content, be sure to change it for Add-Content too:
      $PSDefaultParameterValues['Set-Content:Encoding'] = $PSDefaultParameterValues['Add-Content:Encoding'] ='UTF8'

    • You can also use wildcard patterns to represent the cmdlet / advanced function name to apply the default parameter value to; for instance, if you used $PSDefaultParameterValues['*:Encoding']='UTF8', then all cmdlets that have an -Encoding parameter would default to that value, but that is ill-advised, because in some cmdlets the -Encoding refers to the input encoding.

    • There is no single shared prefix among cmdlets that write to files that allows you to target all output cmdlets, but you can define a pattern for each of the verbs:
      $enc = 'UTF8; $PSDefaultParameterValues += @{ 'Out-*:Encoding'=$enc; 'Set-*:Encoding'=$enc; 'Add-*:Encoding'=$enc; 'Export-*:Encoding'=$enc }

    • Caveat: $PSDefaultParameterValues is defined in the global scope, so any modifications you make to it take effect globally, and affect subsequent commands.
      To limit changes to a script / function's scope and its descendent scopes, use a local $PSDefaultParameterValues variable, which you can either initialize to an empty hashtable to start from scratch ($PSDefaultParameterValues = @{}), or initialize to a clone of the global value ($PSDefaultParameterValues = $PSDefaultParameterValues.Clone())

Caveats:

  • Using the utf8 encoding in Windows PowerShell invariably creates UTF-8 files with a BOM. (Commendably, in PowerShell [Core] v6+ it does not, and this edition even consistently defaults to BOM-less UTF-8; however, you can create a BOM on demand with utf8BOM.

  • However, if you're running Windows 10 and you're willing to switch to BOM-less UTF-8 encoding system-wide - which can have side effects - even Windows PowerShell can be made to use BOM-less UTF-8 consistently - see this answer.


In the case at hand, the output objects are [Microsoft.PowerShell.Commands.MatchInfo] instances output by Select-String:

  • Using default formatting, as happens with Out-File, they output an empty line above, and two empty lines below (with multiple instances printing in a contiguous block between a single set of the empty lines above and below).

  • If you call .psobject.ToString() on them, as happens with Set-File, they evaluate to just the matching lines (with no origin-path prefix, given that input was provided via the pipeline rather than as filenames via the -Path / -LiteralPath parameters), with no leading or trailing empty lines.

That said, had you piped to | Select-Object -ExpandProperty Line or simply | ForEach-Object Line in order to explicitly output just the matching lines as strings, both Out-File and Set-Content would have yielded the same result (except for their default encoding).


P.S.: LotPing's observation is correct: You seem to be confusing the foreach statement with the ForEach-Object cmdlet (which, regrettably, is also known by built-in alias foreach, causing confusion).

The ForEach-Object cmdlet doesn't need an explicit definition for $_: in the (implied -Process) script block you pass to it, $_ is automatically defined to be the input object at hand.

Your ($_) argument to foreach (ForEach-Object) is effectively ignored: because it evaluates to $null: automatic variable $_, when used outside of special contexts - such as script blocks in the pipeline - effectively evaluates to $null, and putting (...) around it makes no difference, so you're effectively passing $null, which is ignored.


[1] Verify that ASCII is not the default as follows: '0x{0:x}' -f $('ä' | Set-Content t.txt; $b=[System.IO.File]::ReadAllBytes("$PWD\t.txt")[0]; ri t.txt; $b) yields 0xe4 on an en-US system, which is the Windows-1252 code point for ä (which coincides with the Unicode codepoint, but the output is a single-byte-encoded file with no BOM).
If you use -Encoding ASCII explicitly, you get 0x3f, the code point for literal ?, because that's what using ASCII converts all non-ASCII chars. to.

[2] PetSerAl found the source-code location that shows that > and >> are effective aliases for Out-File [-Append], and he points out that redefining Out-File therefore also redefines > / >>; similarly, specifying a default encoding via $PSDefaultParameterValues for Out-File also takes effect for > / >>.
Windows PowerShell v5.1 is the minimum version that works this way..

Tip of the hat to PetSerAl for his help.

mklement0
  • 382,024
  • 64
  • 607
  • 775