2

I have a script that updates a configuration file with current year, but for some reason the copyright symbol is not being inserted correctly. The PowerShell script is UTF-8 with BOM and the JSON file is UTF-8.

The workflow is that I read from a JSON file, update the copyright date, and then save to a JSON file again.

The JSON file info.json:

{
    "CopyrightInfo":  "Copyright © CompanyName 1992"
}

Reproducible excerpt of the PowerShell script:

$path = "./info.json"
$a = Get-Content $path| ConvertFrom-Json
$a.'CopyrightInfo' = "Copyright $([char]::ConvertFromUtf32(0x000000A9)) CompanyName $((Get-Date).Year)"
$a | ConvertTo-Json | set-content $path

I've tried a bunch of ways, above is the latest attempt. It looks fine when printed in PowerShell or opened in Notepad, but any other editor (Visual Studio Code, SourceTree, Azure DevOps file viewer, etc) they always result in the following:

"CopyrightInfo":  "Copyright � CompanyName 2022"

If anyone can explain what I'm doing wrong that would great and even greater if they could also add a way to make it work properly.

I'm using PowerShell version 5.1.19041.1682

EDIT: Updated issue with reproducible code excerpts and used PowerShell version.

Kagemand Andersen
  • 1,470
  • 2
  • 16
  • 30

2 Answers2

1

Given that you're running Windows PowerShell and that you want to both read the input and create the output as UTF-8-encoded:

  • If it's acceptable to create a UTF-8 file with BOM (which is what Set-Content -Encoding utf8 in Windows PowerShell invariably creates):

    # Note the use of -Encoding utf8 in both statements.
    # (In PowerShell (Core) 7+, neither would be needed,
    # and Set-Content would create a BOM-*less* UTF-8 file;
    # you'd need -Encoding utf8BOM to create one *with* a BOM).
    
    $a = Get-Content -Encoding utf8 $path| ConvertFrom-Json
    # ...
    $a | ConvertTo-Json | Set-Content -Encoding utf8 $path
    
  • Creating a UTF-8 file without BOM requires more work in Windows PowerShell (whereas this encoding is now the consistent default in PowerShell (Core) 7+), taking advantage of the - curious - fact that New-Item, when given a -Value argument, (invariably) creates files with that encoding:

    # (In PowerShell (Core) 7+, -Encoding utf8 wouldn't be needed,
    # and Set-Content would create a BOM-*less* UTF-8 file by default.)
    
    $a = Get-Content -Encoding utf8 $path| ConvertFrom-Json
    # ...
    New-Item -Force -Path $path -Value (($a | ConvertTo-Json) + "`r`n")
    

Note:

  • On reading: PowerShell recognizes Unicode BOMs automatically, but what encoding is assumed in the absence of a BOM depends on the PowerShell edition, both when reading source code and when reading files via cmdlets, such as via Get-Content:

    • Windows PowerShell assumes the system's legacy ANSI code page (aka language for non-Unicode programs).

    • PowerShell (Core) assumes UTF-8.

  • On writing: Once a file is read, PowerShell does not preserve information about an input file's original character encoding - the file content is stored in .NET strings (which are composed of in-memory UTF-16LE code units), even when the data is simply passed through the pipeline. As such, it is a file-writing cmdlet's own default encoding that is used if no -Encoding argument is specified, irrespective of where the data came from; specifically:

    • Windows PowerShell's Set-Content defaults to the system legacy ANSI encoding; unfortunately, other cmdlets have different defaults; notably, Out-File and its virtual alias, >, default to UTF-16LE ("Unicode") - see the bottom section of this answer for details.

    • PowerShell (Core) now fortunately defaults to BOM-less UTF-8, across all cmdlets.

mklement0
  • 382,024
  • 64
  • 607
  • 775
0

Can't reproduce the issue:

$Data = @{ CopyrightInfo = "Copyright $([char]::ConvertFromUtf32(0x000000A9)) CompanyName $((Get-Date).Year)" }
$Json = ConvertTo-Json $Data
$Json |Set-Content .\Test.json
$Json = Get-Content -Raw .\Test.json
$Data = ConvertFrom-Json $Json
$Data
CopyrightInfo
-------------
Copyright © CompanyName 2022

To show the result in PowerShell with any external program see: Displaying Unicode in Powershell

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding =  New-Object System.Text.UTF8Encoding
iRon
  • 20,463
  • 10
  • 53
  • 79
  • I apologise. I've updated my example to make a bit clearer. It displays it as intended in PowerShell, but when I view the file with anything else the symbol is replaced. – Kagemand Andersen Nov 10 '22 at 11:28
  • I mentioned it in the example, but Visual Studio Code, the file viewer in SourceTree, and the file viewer in Azure DevOps. The command returns utf-8. – Kagemand Andersen Nov 10 '22 at 11:56
  • 1
    Ah, seems the issue was that PowerShell wasn't setting the correct encoding somehow. Adding explicit parameters (`Set-Content -Encoding "utf8" -Path $path`) did the trick. – Kagemand Andersen Nov 10 '22 at 12:10