0

I'm trying to convert ANSI and UTF-8 BOM files to UTF-8 without BOM only. I have found a code that works to do that but in my files the word "président" from ANSI file, for exemple, is converted to "prxE9sident" or "pr?sident" (problem with accident é) in UTF8.

The script powershell code that I run in my parent folder:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
$source = "path"
$destination = "some_folder"

foreach ($i in Get-ChildItem -Recurse -Force) {
    if ($i.PSIsContainer) {
        continue
    }

    $path = $i.DirectoryName -replace $source, $destination
    $name = $i.Fullname -replace $source, $destination

    if ( !(Test-Path $path) ) {
        New-Item -Path $path -ItemType directory
    }

    $content = get-content $i.Fullname

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
    } else {
        Write-Host "No content from: $i"   
    }
}

Any solution to keep accents well from ANSI and other files ?

Jax22
  • 3
  • 1
  • [`$Null` should be on the left hand side of the equality comparison](https://stackoverflow.com/a/60996703/1701026). If your file is closed with a newline, the last item in the `$Content` array is `$Null` which evaluates to [`$False`](https://learn.microsoft.com/powershell/module/microsoft.powershell.core/about/about_booleans) for `if ( $content -ne $null ) { ...` and therefore doesn't even update your file. – iRon Sep 15 '22 at 09:30
  • It works thank you ! Now I observe that my problem is only with UTF8 BOM files that should be converted to UTF8 without BOM (and no ANSI file), do you think there is any easy way to do that with this code ? Thank you – Jax22 Sep 15 '22 at 09:53
  • I think my last comment was actually incorrect. – iRon Sep 15 '22 at 11:38
  • Try: `[System.IO.File]::WriteAllLines($name, $content, ([System.Text.Encoding]::GetEncoding(1252)))` – iRon Sep 15 '22 at 14:30
  • It works perfectly !! Many thanks for your answers and reactivity. Have a nice day ! – Jax22 Sep 15 '22 at 14:37
  • If you use powershell 7, utf8nobom is the default format for set-content. – js2010 Sep 15 '22 at 20:43

1 Answers1

1

There are actually two PowerShell Gotchas in the condition:

if ( $content -ne $null ) { ...
  1. $Null should be on the left hand side of the equality comparison operator
  2. If your file is closed with a newline, the last item in the Get-Content results array is $Null

This might cause the concerned condition to unexpectedly evaluate to $False and therefore your script doesn't even update the required files.

Based on the additional comments, to save you files as ANSI, you should use the Windows-1252 encoding:

[System.IO.File]::WriteAllLines($name, $content, ([System.Text.Encoding]::GetEncoding(1252)))
iRon
  • 20,463
  • 10
  • 53
  • 79