364

Out-File seems to force the BOM when using UTF-8:

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "UTF8" $MyPath

How can I write a file in UTF-8 with no BOM using PowerShell?

Update 2021

PowerShell has changed a bit since I wrote this question 10 years ago. Check multiple answers below, they have a lot of good information!

sourcenouveau
  • 29,356
  • 35
  • 146
  • 243
  • 33
    BOM = Byte-Order Mark. Three chars placed at the beginning of a file (0xEF,0xBB,0xBF) that look like "" – Signal15 Nov 26 '14 at 16:50
  • 65
    This is incredibly frustrating. Even third party modules get polluted, like trying to upload a file over SSH? BOM! "Yeah, let's corrupt every single file; that sounds like a good idea." -Microsoft. – MichaelGG Apr 01 '15 at 20:48
  • 9
    The default encoding is UTF8NoBOM starting with Powershell version 6.0 https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/out-file?view=powershell-6#parameters – Paul Shiryaev Jul 09 '19 at 14:48
  • 3
    Talk about breaking backwards compatibility... – Dragas Jan 13 '20 at 15:31
  • I feel like it should be noted that while a BOM in a UTF-8 file does make a lot of systems choke, [it is explicitly valid in the Unicode UTF-8 spec to include one](https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8). – Bacon Bits Jun 28 '22 at 14:10
  • Thanks for this. I was pulling my hair out - tried two formats each not adhering to request for UTF-8 .. 1) `$stdcaltxt | Out-File -encoding utf8 -FilePath $stdCalFileName` and 2) `Set-Content -Path $stdCalFileName -Value $stdcaltxt -Encoding utf8` Each yielded different encoding *UTF8-BOM* and *USC2 LE BOM* according to Notepad++ encoding checks! – JGFMK Mar 29 '23 at 13:43

19 Answers19

291

Using .NET's UTF8Encoding class and passing $False to the constructor seems to work:

$MyRawString = Get-Content -Raw $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyRawString, $Utf8NoBomEncoding)
XDS
  • 3,786
  • 2
  • 36
  • 56
sourcenouveau
  • 29,356
  • 35
  • 146
  • 243
  • 63
    Ugh, I hope that's not the only way. – Scott Muc May 24 '11 at 06:16
  • 138
    One line `[System.IO.File]::WriteAllLines($MyPath, $MyFile)` is enough. This `WriteAllLines` overload writes exactly UTF8 without BOM. – Roman Kuzmin Nov 08 '11 at 19:42
  • 6
    Created an MSDN feature request here: https://connect.microsoft.com/PowerShell/feedbackdetail/view/1137121/add-nobom-flag-to-out-file – Groostav Feb 18 '15 at 20:08
  • 8
    Note that `WriteAllLines` seems to require `$MyPath` to be absolute. – sschuberth Jan 04 '17 at 15:38
  • 1
    @sschuberth I just tried WriteAllLines with a relative path, works fine for me. Does it give you an error with a relative path? – codewario Jan 20 '17 at 20:17
  • 1
    @AlexanderMiles It "works", but the file ends up being in some weird directory (not relative to the current working directory). IIRC it was the path of the PowerShell interpreter binary. – sschuberth Jan 20 '17 at 22:17
  • For me, it seems to write the file to my Desktop even if I'm currently in another directory. – xdhmoore Feb 01 '17 at 06:17
  • 3
    If you don't want an extra new line in the end of the file, you can do this: `[IO.File]::WriteAllText($MyPath, $MyFile)`. – Rosberg Linhares Jun 17 '17 at 01:03
  • 18
    @xdhmoore `WriteAllLines` gets the current directory from `[System.Environment]::CurrentDirectory`. If you open PowerShell and then change your current directory (using `cd` or `Set-Location`), then `[System.Environment]::CurrentDirectory` will not be changed and the file will end up being in the wrong directory. You can work around this by `[System.Environment]::CurrentDirectory = (Get-Location).Path`. – Shayan Toqraee Sep 30 '17 at 19:00
  • This looks to be the solution still in 2018 with [Out-File from PowerShell 6](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/out-file?view=powershell-6); but Notepad++ states the file has no encoding, any hint? – watery Mar 09 '18 at 16:37
  • $MyFile variable does not have to be object that is created by a Get-Content. It can also be a plain string, i.e. $MyFile = "utf8 string of some kind..." – DoubleOZ Jun 20 '18 at 12:49
  • Instead of New-Object System.Text.UTF8Encoding $False you can use simply New-Object System.Text.UTF8Encoding, since "This constructor creates an instance that does not provide a Unicode byte order mark", see https://learn.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.-ctor?view=netframework-4.7.2#System_Text_UTF8Encoding__ctor – pholpar Apr 17 '19 at 09:22
  • 3
    As @RosbergLinhares noted, `WriteAllLines` adds an extra new line at the end of a file. But to make `WriteAllText` work you have to use `-Raw` parameter for `Get-Content`, otherwise all text will be squashed into a single line. `$fileContent = Get-Content -Raw "$fileFullName"; [System.IO.File]::WriteAllText($fileFullName, $fileContent)` – PolarBear Jul 26 '19 at 08:55
106

The proper way as of now is to use a solution recommended by @Roman Kuzmin in comments to @M. Dudley answer:

[IO.File]::WriteAllLines($filename, $content)

(I've also shortened it a bit by stripping unnecessary System namespace clarification - it will be substituted automatically by default.)

Community
  • 1
  • 1
ForNeVeR
  • 6,726
  • 5
  • 24
  • 43
  • 3
    This (for whatever reason) did not remove the BOM for me, where as the accepted answer did – Liam Jun 17 '16 at 10:31
  • 1
    @Liam, probably some old version of PowerShell or .NET? – ForNeVeR Jun 17 '16 at 14:58
  • 2
    I believe older versions of the .NET WriteAllLines function did write the BOM by default. So it could be a version issue. – codewario Jan 23 '17 at 16:38
  • @AlexanderMiles best I can tell from [.NET 2.0 documentation](https://msdn.microsoft.com/en-us/library/92e05ft3(v=vs.80).aspx), it still uses BOMless UTF-8 there. – ForNeVeR Jan 24 '17 at 03:37
  • Can confirm this writes UTF8 no BOM on Win10 / .Net 4.6. But still needs an absolute path . – BobHy Sep 24 '17 at 17:09
  • 3
    Confirmed with writes with a BOM in Powershell 3, but without a BOM in Powershell 4. I had to use M. Dudley's original answer. – chazbot7 Oct 30 '17 at 22:31
  • 6
    So it works on Windows 10 where it's installed by default. :) Also, suggested improvement: `[IO.File]::WriteAllLines(($filename | Resolve-Path), $content)` – Johny Skovdal Jan 12 '18 at 07:05
  • In powershell 5.1 on Win 10: `[IO.File]::WriteAllLines("c:\users\user\file.txt", $content)` gives me `Cannot find an overload for "WriteAllLines" and the argument count "2"` – duct_tape_coder Apr 07 '21 at 22:32
77

I figured this wouldn't be UTF, but I just found a pretty simple solution that seems to work...

Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext

For me this results in a utf-8 without bom file regardless of the source format.

Lenny
  • 5,663
  • 2
  • 19
  • 27
  • 16
    This worked for me, except I used `-encoding utf8` for my requirement. – Just Rudy Jan 12 '17 at 14:53
  • 1
    Thank you very much. I am working with dump logs of a tool - which had tabs inside it. UTF-8 was not working. ASCII solved the problem. Thanks. – user1529294 Apr 07 '17 at 05:50
  • 72
    Yes, `-Encoding ASCII` avoids the BOM problem, but you obviously only get _7-bit ASCII characters_. Given that ASCII is a subset of UTF-8, the resulting file is technically also a valid UTF-8 file, but _all non-ASCII characters in your input will be converted to literal `?` characters_. – mklement0 Apr 07 '17 at 13:51
  • 6
    **Warning:** Definitely not. This deletes all non-ASCII characters and replaces them with question marks. Don't do this or you will lose data! (Tried with PS 5.1 on Windows 10) – ygoe Apr 13 '22 at 11:51
60

Note: This answer applies to Windows PowerShell; by contrast, in the cross-platform PowerShell Core edition (v6+), UTF-8 without BOM is the default encoding, across all cmdlets.

  • In other words: If you're using PowerShell [Core] version 6 or higher, you get BOM-less UTF-8 files by default (which you can also explicitly request with -Encoding utf8 / -Encoding utf8NoBOM, whereas you get with-BOM encoding with -utf8BOM).

  • If you're running Windows 10 or above and you're willing to switch to BOM-less UTF-8 encoding system-wide - which has far-reaching consequences, however - even Windows PowerShell can be made to use BOM-less UTF-8 consistently - see this answer.


To complement M. Dudley's own simple and pragmatic answer (and ForNeVeR's more concise reformulation):

  • A simple, (non-streaming) PowerShell-native alternative is to use New-Item, which (curiously) creates BOM-less UTF-8 files by default even in Windows PowerShell:

    # Note the use of -Raw to read the file as a whole.
    # Unlike with Set-Content / Out-File *no* trailing newline is appended.
    $null = New-Item -Force $MyPath -Value (Get-Content -Raw $MyPath)
    
    • Note: To save the output from arbitrary commands in the same format as Out-File would, pipe to Out-String first; e.g.:

       $null = New-Item -Force Out.txt -Value (Get-ChildItem | Out-String) 
      
  • For convenience, below is advanced custom function Out-FileUtf8NoBom, a pipeline-based alternative that mimics Out-File, which means:

    • you can use it just like Out-File in a pipeline.
    • input objects that aren't strings are formatted as they would be if you sent them to the console, just like with Out-File.
    • an additional -UseLF switch allows you use Unix-format LF-only newlines ("`n") instead of the Windows-format CRLF newlines ("`r`n") you normally get.

Example:

(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath # Add -UseLF for Unix newlines

Note how (Get-Content $MyPath) is enclosed in (...), which ensures that the entire file is opened, read in full, and closed before sending the result through the pipeline. This is necessary in order to be able to write back to the same file (update it in place).
Generally, though, this technique is not advisable for 2 reasons: (a) the whole file must fit into memory and (b) if the command is interrupted, data will be lost.

A note on memory use:

  • M. Dudley's own answer and the New-Item alternative above require that the entire file contents be built up in memory first, which can be problematic with large input sets.
  • The function below does not require this, because it is implemented as a proxy (wrapper) function (for a concise summary of how to define such functions, see this answer).

Source code of function Out-FileUtf8NoBom:

Note: The function is also available as an MIT-licensed Gist, and only the latter will be maintained going forward.

You can install it directly with the following command (while I can personally assure you that doing so is safe, you should always check the content of a script before directly executing it this way):

# Download and define the function.
irm https://gist.github.com/mklement0/8689b9b5123a9ba11df7214f82a673be/raw/Out-FileUtf8NoBom.ps1 | iex
function Out-FileUtf8NoBom {

  <#
  .SYNOPSIS
    Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).

  .DESCRIPTION

    Mimics the most important aspects of Out-File:
      * Input objects are sent to Out-String first.
      * -Append allows you to append to an existing file, -NoClobber prevents
        overwriting of an existing file.
      * -Width allows you to specify the line width for the text representations
        of input objects that aren't strings.
    However, it is not a complete implementation of all Out-File parameters:
      * Only a literal output path is supported, and only as a parameter.
      * -Force is not supported.
      * Conversely, an extra -UseLF switch is supported for using LF-only newlines.

  .NOTES
    The raison d'être for this advanced function is that Windows PowerShell
    lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8 
    invariably prepends a BOM.

    Copyright (c) 2017, 2022 Michael Klement <mklement0@gmail.com> (http://same2u.net), 
    released under the [MIT license](https://spdx.org/licenses/MIT#licenseText).

  #>

  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, Position = 0)] [string] $LiteralPath,
    [switch] $Append,
    [switch] $NoClobber,
    [AllowNull()] [int] $Width,
    [switch] $UseLF,
    [Parameter(ValueFromPipeline)] $InputObject
  )

  begin {

    # Convert the input path to a full one, since .NET's working dir. usually
    # differs from PowerShell's.
    $dir = Split-Path -LiteralPath $LiteralPath
    if ($dir) { $dir = Convert-Path -ErrorAction Stop -LiteralPath $dir } else { $dir = $pwd.ProviderPath }
    $LiteralPath = [IO.Path]::Combine($dir, [IO.Path]::GetFileName($LiteralPath))
    
    # If -NoClobber was specified, throw an exception if the target file already
    # exists.
    if ($NoClobber -and (Test-Path $LiteralPath)) {
      Throw [IO.IOException] "The file '$LiteralPath' already exists."
    }
    
    # Create a StreamWriter object.
    # Note that we take advantage of the fact that the StreamWriter class by default:
    # - uses UTF-8 encoding
    # - without a BOM.
    $sw = New-Object System.IO.StreamWriter $LiteralPath, $Append
    
    $htOutStringArgs = @{}
    if ($Width) { $htOutStringArgs += @{ Width = $Width } }

    try { 
      # Create the script block with the command to use in the steppable pipeline.
      $scriptCmd = { 
        & Microsoft.PowerShell.Utility\Out-String -Stream @htOutStringArgs | 
          . { process { if ($UseLF) { $sw.Write(($_ + "`n")) } else { $sw.WriteLine($_) } } }
      }  
      
      $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
      $steppablePipeline.Begin($PSCmdlet)
    }
    catch { throw }

  }

  process
  {
    $steppablePipeline.Process($_)
  }

  end {
    $steppablePipeline.End()
    $sw.Dispose()
  }

}
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Example usage from a utf8BOM file to plain utf8: `$null = New-Item -Force "\$env:ProgramData\ssh\administrators_authorized_keys" -Value (Get-Content -Path "\$env:ProgramData\ssh\administrators_authorized_keys" | Out-String)` – nhooyr Dec 15 '22 at 13:37
  • 2
    @nhooyr, it's better to use `$null = New-Item -Force $MyPath -Value (Get-Content -Raw $MyPath)` (much faster, and preserves the existing newline format) - I've updated the answer. – mklement0 Dec 15 '22 at 13:57
31

Starting from version 6 powershell supports the UTF8NoBOM encoding both for set-content and out-file and even uses this as default encoding.

So in the above example it should simply be like this:

$MyFile | Out-File -Encoding UTF8NoBOM $MyPath
user2864740
  • 60,010
  • 15
  • 145
  • 220
sc911
  • 1,130
  • 10
  • 18
  • 5
    Nice. FYI check version with `$PSVersionTable.PSVersion` – KCD Oct 29 '19 at 02:48
  • 1
    Worth noting that in PowerShell [Core] v6+ `-Encoding UTF8NoBOM ` is never _required_, because it is the _default_ encoding. – mklement0 Oct 25 '20 at 20:58
  • https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/set-content?view=powershell-7.3#-encoding seems to confirm this - but having issues scraping redirected dos output to files and emitting back out. Notepad++ gives different encodings (UCS-2 LE BOM on one, and UTF8-BOM on the other!) on two of files I emit - when driving a screen scraping automation/redirection to an temp outfile and extracting substrings. I am driving from a CSV for different arguments for the same command, also setting encoding everywhere - but Powershell seems to be ignoring that. – JGFMK Mar 29 '23 at 11:32
  • Be sure to check out this answer and my comment under the question itself here too if you are pulling your hair out over Powershell encoding not working as expected (Set-Content and Out-File giving different encoding answers despite specifically requesting utf8!). https://stackoverflow.com/a/5596984/495157 – JGFMK Mar 29 '23 at 13:47
20

When using Set-Content instead of Out-File, you can specify the encoding Byte, which can be used to write a byte array to a file. This in combination with a custom UTF8 encoding which does not emit the BOM gives the desired result:

# This variable can be reused
$utf8 = New-Object System.Text.UTF8Encoding $false

$MyFile = Get-Content $MyPath -Raw
Set-Content -Value $utf8.GetBytes($MyFile) -Encoding Byte -Path $MyPath

The difference to using [IO.File]::WriteAllLines() or similar is that it should work fine with any type of item and path, not only actual file paths.

Lucero
  • 59,176
  • 9
  • 122
  • 152
  • 1
    Nice - works great with strings (which may be all that is needed and certainly meets the requirements of the question). In case you need to take advantage of the formatting that `Out-File`, unlike `Set-Content`, provides, pipe to `Out-String` first; e.g., `$MyFile = Get-ChildItem | Out-String` – mklement0 Oct 25 '20 at 21:06
6

This script will convert, to UTF-8 without BOM, all .txt files in DIRECTORY1 and output them to DIRECTORY2

foreach ($i in ls -name DIRECTORY1\*.txt)
{
    $file_content = Get-Content "DIRECTORY1\$i";
    [System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content);
}
jamhan
  • 3,100
  • 1
  • 16
  • 4
  • This one fails without any warning. What version of powershell should I use to run it? – darksoulsong Sep 08 '13 at 13:34
  • 5
    The WriteAllLines solution works great for small files. However, I need a solution for larger files. Every time I try to use this with a larger file I'm getting an OutOfMemory error. – BermudaLamb Mar 25 '15 at 15:44
6

important!: this only works if an extra space or newline at the start is no problem for your use case of the file
(e.g. if it is an SQL file, Java file or human readable text file)

one could use a combination of creating an empty (non-UTF8 or ASCII (UTF8-compatible)) file and appending to it (replace $str with gc $src if the source is a file):

" "    |  out-file  -encoding ASCII  -noNewline  $dest
$str  |  out-file  -encoding UTF8   -append     $dest

as one-liner

replace $dest and $str according to your use case:

$_ofdst = $dest ; " " | out-file -encoding ASCII -noNewline $_ofdst ; $src | out-file -encoding UTF8 -append $_ofdst

as simple function

function Out-File-UTF8-noBOM { param( $str, $dest )
  " "    |  out-file  -encoding ASCII  -noNewline  $dest
  $str  |  out-file  -encoding UTF8   -append     $dest
}

using it with a source file:

Out-File-UTF8-noBOM  (gc $src),  $dest

using it with a string:

Out-File-UTF8-noBOM  $str,  $dest
  • optionally: continue appending with Out-File:

    "more foo bar"  |  Out-File -encoding UTF8 -append  $dest
    
Andreas Covidiot
  • 4,286
  • 5
  • 51
  • 96
6

Old question, new answer:

While the "old" powershell writes a BOM, the new platform-agnostic variant does behave differently: The default is "no BOM" and it can be configured via switch:

-Encoding

Specifies the type of encoding for the target file. The default value is utf8NoBOM.

The acceptable values for this parameter are as follows:

  • ascii: Uses the encoding for the ASCII (7-bit) character set.
  • bigendianunicode: Encodes in UTF-16 format using the big-endian byte order.
  • oem: Uses the default encoding for MS-DOS and console programs.
  • unicode: Encodes in UTF-16 format using the little-endian byte order.
  • utf7: Encodes in UTF-7 format.
  • utf8: Encodes in UTF-8 format.
  • utf8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM)
  • utf8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM)
  • utf32: Encodes in UTF-32 format.

Source: https://learn.microsoft.com/de-de/powershell/module/Microsoft.PowerShell.Utility/Out-File?view=powershell-7 Emphasis mine

JensG
  • 13,148
  • 4
  • 45
  • 55
3

For PowerShell 5.1, enable this setting:

Control Panel, Region, Administrative, Change system locale, Use Unicode UTF-8 for worldwide language support

Then enter this into PowerShell:

$PSDefaultParameterValues['*:Encoding'] = 'Default'

Alternatively, you can upgrade to PowerShell 6 or higher.

https://github.com/PowerShell/PowerShell

Zombo
  • 1
  • 62
  • 391
  • 407
  • 1
    To spell it out: This is a _system-wide_ setting that makes Windows PowerShell _default_ to BOM-less UTF-8 _across all cmdlets_, which may or may not be desired, not least because the feature is still in beta (as of this writing) and can break legacy console applications - see [this answer](https://stackoverflow.com/a/65192064/45375) for background information. – mklement0 Dec 08 '20 at 17:23
2

I would say to use just the Set-Content command, nothing else needed.

The powershell version in my system is :-

PS C:\Users\XXXXX> $PSVersionTable.PSVersion | fl


Major         : 5
Minor         : 1
Build         : 19041
Revision      : 1682
MajorRevision : 0
MinorRevision : 1682

PS C:\Users\XXXXX>

So you would need something like following.

PS C:\Users\XXXXX> Get-Content .\Downloads\finddate.txt
Thursday, June 23, 2022 5:57:59 PM
PS C:\Users\XXXXX> Get-Content .\Downloads\finddate.txt | Set-Content .\Downloads\anotherfile.txt
PS C:\Users\XXXXX> Get-Content .\Downloads\anotherfile.txt
Thursday, June 23, 2022 5:57:59 PM
PS C:\Users\XXXXX>

Now when we check the file as per the screenshot it is utf8. anotherfile.txt

PS: To answer on the comment query on foreign character issue. The contents from file "testfgnchar.txt" which having the foreign characters, was copied to "findfnchar2.txt" using the following command.

PS C:\Users\XXXXX> Get-Content .\testfgnchar.txt | Set-Content findfnchar2.txt
PS C:\Users\XXXXX>

screen-shot is here.

Note: Currently, there are newer versions of PowerShell exists, than the one I used during answer.

  • This seemed to work at first, but this actually uses the user’s ANSI code page and replaces other symbols with closest equivalents (e. g. š → s) or question marks. Using `set-content -encoding utf8` works. – Chortos-2 Mar 21 '23 at 13:44
  • @Chortos-2, thanks for commenting. I am more concerned to save the file in "UTF-8" format strictly. If I use "set-content -encoding utf8", it saves the file in "UTF-8-BOM" format – Pravanjan Hota Apr 05 '23 at 08:11
  • Ah, that’s true; I didn’t notice. But then that means this command is completely unsuitable for the task, because without `-encoding`, it doesn’t use UTF-8 _at all,_ neither with BOM nor without. – Chortos-2 Apr 06 '23 at 11:19
  • @Chortos-2, Each system or user will have their own language set during installation. The answer I have shared is for Language English(United States) which is on my system and same is also showing in my Regional Language. I feel this language is most commonly used during installation. Hence that could be the blocker here, but could not help further. What Language system is showing in your system? Press Windows Key and I on your machine to find out the same. – Pravanjan Hota Apr 07 '23 at 12:11
  • The point is that this command uses the ANSI code page, not UTF-8, which was explicitly requested in the question (unless you’ve set ANSI to be UTF-8 as per Zombo’s answer). On your English system, try `echo āčķʃλшא⁴ℝ→⅛≈あ子 | set-content file.txt`, and you’ll see none of the characters were preserved. The same problem is pointed out for other PowerShell command in the comments to other answers. It’s certainly good to know that `set-content` defaults to encoding Latin in single bytes, but it is very different from UTF-8 as originally requested. – Chortos-2 Apr 08 '23 at 21:00
  • For the echo command with chars supplied, I could save it directly in a file and have no impact. However, powershell CLI can not identify them while doing echo and replaces them all with question(?) marks. Also trying to read from the saved file, could not read it as per my language set. PS > Get-Content .\testfgnchar.txt ÄÄķʃλш×â´â„→⅛≈ã‚å­ > I would say it is a system to system variation and local language install issue. Users can adopt as per their local language installation choice with a little research and whatever suites them. – Pravanjan Hota Apr 10 '23 at 14:00
  • It still works for me. Screen-shot is attached in the PS section. – Pravanjan Hota Apr 10 '23 at 15:05
  • I’m afraid I have no idea what the screenshot illustrates, and you confirm that you cannot produce UTF-8 from the command line. Here’s another test: run `echo Naïveté | set-content file.txt` on the command line and open the file in Notepad++. It won’t be in UTF-8; it will be in Windows-1252, which is your system’s ANSI encoding. – Chortos-2 Apr 11 '23 at 15:57
  • This is a question about a very clearly defined character encoding, and this answer fails to provide it. It _is_ possible to configure a modern Windows system to use UTF-8 as the ANSI encoding, and Zombo’s answer shows how. But this answer makes no attempt to even recognize this, and it isn’t quite what the question asked for, either, as it is a system-wide setting instead of an individual command. – Chortos-2 Apr 11 '23 at 15:59
  • `get-content` similarly uses the ANSI encoding, as you confirm you saw in your test. If the screenshot shows the result of `get-content a.txt | set-content b.txt`, it merely read the file as Windows-1252 and wrote it back as Windows-1252, which resulted in a byte-by-byte copy. UTF-8 was not involved at any point in the process. The very reason for this question is that the encoding varies depending on Windows settings, so a reliable way to use UTF-8 was requested. – Chortos-2 Apr 11 '23 at 16:10
  • You can further confirm this by comparing the behaviour of `get-content` and `set-content` without parameters and with `-encoding utf8` (or another Unicode encoding of your choice). The BOM is not the only difference, and in particular, `get-content -encoding utf8 testfgnchar.txt` will show you the correct characters. – Chortos-2 Apr 11 '23 at 16:15
  • Question is asking to read and save from file to file and secondly that is what is shown in the screen-shot that gets saved in UTF-8. What you are asking is different and that echo chars(as per Google translator seems French letters) from a terminal to a target file. There might be different answer to that question. – Pravanjan Hota Apr 17 '23 at 09:55
  • Your screenshot shows absolutely no PowerShell; just two identical files. The question does show a `get-content` invocation in the example, but the title and text very clearly ask about the text encoding used by the write command, regardless of what produced the text in the first place. It also doesn’t say that their original file was UTF-8 to begin with; maybe they’re using `get-content` precisely because they have an ANSI-encoded file that they want to convert to UTF-8. If one wanted to just copy a file, there are simpler ways to do it without involving `get-content`. – Chortos-2 Apr 17 '23 at 12:10
  • Have you tried my suggestions? UTF-8 is not involved in your process at any point at all. If you still refuse to see this, there’s no point in continuing this discussion. The comments above are enough to make it clear to other people who may stumble upon your answer. – Chortos-2 Apr 17 '23 at 12:13
  • Oh, and as pointed out in the comments to the question, later PowerShell versions (supposedly 6.0+) in fact default to `UTF8noBOM` in various commands including `get-content` and `set-content`, which makes this answer work on those versions—just as well as the command quoted in the question. My bad: I should’ve made it clear that my comment only applies to older PowerShell, including the default PowerShell 5.1 that ships with Windows; but your comments show you’re testing on an older version, too, so everything I said above still stands. – Chortos-2 Apr 17 '23 at 12:20
  • In fact, you do seem to understand that it isn’t using UTF-8 (or any kind of Unicode encoding), as you said this yourself (emphasis mine): “could not read it _as per my language set_”. If that’s the case, I’m not sure why we’re even having an argument. – Chortos-2 Apr 17 '23 at 12:34
1

Change multiple files by extension to UTF-8 without BOM:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach($i in ls -recurse -filter "*.java") {
    $MyFile = Get-Content $i.fullname 
    [System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding)
}
Jaume Suñer Mut
  • 381
  • 4
  • 11
1
    [System.IO.FileInfo] $file = Get-Item -Path $FilePath 
    $sequenceBOM = New-Object System.Byte[] 3 
    $reader = $file.OpenRead() 
    $bytesRead = $reader.Read($sequenceBOM, 0, 3) 
    $reader.Dispose() 
    #A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191 
    if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191) 
    { 
        $utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) 
        [System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding) 
        Write-Host "Remove UTF-8 BOM successfully" 
    } 
    Else 
    { 
        Write-Warning "Not UTF-8 BOM file" 
    }  

Source How to remove UTF8 Byte Order Mark (BOM) from a file using PowerShell

frank tan
  • 131
  • 1
  • 4
1

If you want to use [System.IO.File]::WriteAllLines(), you should cast second parameter to String[] (if the type of $MyFile is Object[]), and also specify absolute path with $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), like:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile
[System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding)

If you want to use [System.IO.File]::WriteAllText(), sometimes you should pipe the second parameter into | Out-String | to add CRLFs to the end of each line explictly (Especially when you use them with ConvertTo-Csv):

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp
[System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding)

Or you can use [Text.Encoding]::UTF8.GetBytes() with Set-Content -Encoding Byte:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "/absolute/path/to/foobar.csv"

see: How to write result of ConvertTo-Csv to a file in UTF-8 without BOM

SATO Yusuke
  • 1,600
  • 15
  • 39
  • Good pointers; suggestions/: the simpler alternative to `$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath)` is `Convert-Path $MyPath`; if you want to ensure a trailing CRLF, simply use `[System.IO.File]::WriteAllLines()` even with a _single_ input string (no need for `Out-String`). – mklement0 Feb 19 '18 at 16:05
0

One technique I utilize is to redirect output to an ASCII file using the Out-File cmdlet.

For example, I often run SQL scripts that create another SQL script to execute in Oracle. With simple redirection (">"), the output will be in UTF-16 which is not recognized by SQLPlus. To work around this:

sqlplus -s / as sysdba "@create_sql_script.sql" |
Out-File -FilePath new_script.sql -Encoding ASCII -Force

The generated script can then be executed via another SQLPlus session without any Unicode worries:

sqlplus / as sysdba "@new_script.sql" |
tee new_script.log

Update: As others have pointed out, this will drop non-ASCII characters. Since the user asked for a way to "force" conversion, I assume they do not care about that as perhaps their data does not contain such data.

If you care about the preservation of non-ASCII characters, this is not the answer for you.

Erik Anderson
  • 4,915
  • 3
  • 31
  • 30
  • 7
    Yes, `-Encoding ASCII` avoids the BOM problem, but you obviously only get support for _7-bit ASCII characters_. Given that ASCII is a subset of UTF-8, the resulting file is technically also a valid UTF-8 file, but _all non-ASCII characters in your input will be converted to literal `?` characters_. – mklement0 Feb 19 '18 at 17:03
  • 1
    This answer needs more votes. The sqlplus incompatibility with BOM is a cause of [many headaches](https://stackoverflow.com/questions/10758094/is-it-possible-to-run-a-sqlplus-script-on-a-file-encoded-as-utf-8-with-bom/49019748#49019748). – Amit Naidu Mar 08 '18 at 00:06
  • 1
    @AmitNaidu No, this is the wrong answer, because it won't work if the text has any non-ascii characters: any accents, umlauts, oriental/cryllic, etc. – Joel Coehoorn Feb 10 '22 at 05:19
  • @JoelCoehoorn This is a correct answer according to what the user asked. Since the user asked for a way to "force", they're not expecting any issues or don't care probably because the source doesn't use any non-ASCII characters. For those who do care about the preservation of those characters, this will not work. – Erik Anderson Feb 11 '22 at 18:43
0

I have the same error in the PowerShell and used this isolation and fixed it

$PSDefaultParameterValues['*:Encoding'] = 'utf8'
Nader Gharibian Fard
  • 6,417
  • 4
  • 12
  • 22
0

Used this method to edit a UTF8-NoBOM file and generated a file with correct encoding-

$fileD = "file.xml"
(Get-Content $fileD) | ForEach-Object { $_ -replace 'replace text',"new text" } | out-file "file.xml" -encoding ASCII

I was skeptical at this method at first, but it surprised me and worked!

Tested with powershell version 5.1

-3

Could use below to get UTF8 without BOM

$MyFile | Out-File -Encoding ASCII
Robin Wang
  • 779
  • 1
  • 8
  • 16
  • 4
    No, it will convert the output to current ANSI codepage (cp1251 or cp1252, for example). It is not UTF-8 at all! – ForNeVeR Oct 05 '15 at 15:05
  • 1
    Thanks Robin. This may not have worked for writing a UTF-8 file without the BOM but the -Encoding ASCII option removed the BOM. That way I could generate a bat file for gvim. The .bat file was tripping up on the BOM. – Greg Dec 10 '15 at 22:34
  • 3
    @ForNeVeR: You're correct that encoding `ASCII` is not UTF-8, but it's als not the current ANSI codepage - you're thinking of `Default`; `ASCII` truly is 7-bit ASCII encoding, with codepoints >= 128 getting converted to literal `?` instances. – mklement0 Jan 21 '16 at 06:01
  • @mklement0 AFAIK `ASCII` really mean the default single-byte encoding in this API and generally in Windows. Yes, it is not in sync with the official ASCII definition, but is just a historical legacy. – ForNeVeR Jan 21 '16 at 09:03
  • 1
    @ForNeVeR: You're probably thinking of "ANSI" or "_extended_ ASCII". Try this to verify that `-Encoding ASCII` is indeed 7-bit ASCII only: `'äb' | out-file ($f = [IO.Path]::GetTempFilename()) -encoding ASCII; '?b' -eq $(Get-Content $f; Remove-Item $f)` - the `ä` has been transliterated to a `?`. By contrast, `-Encoding Default` ("ANSI") would correctly preserve it. – mklement0 Jan 21 '16 at 15:07
  • 3
    @rob This is the perfect answer for everybody who just doesn't need utf-8 or anything else that is different to ASCII and is not interested in understanding encodings and the purpose of unicode. You can **use** it as utf-8 because the equivalent utf-8 characters to all ASCII characters are identical (means converting an ASCII-file to an utf-8-file results in an identical file (if it gets no BOM)). For all who have non-ASCII characters in their text this answer is just false and misleading. – TNT Aug 25 '16 at 19:25
-4

This one works for me (use "Default" instead of "UTF8"):

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "Default" $MyPath

The result is ASCII without BOM.

  • 2
    Per the [Out-File documentation](https://technet.microsoft.com/en-us/library/hh849882.aspx) specifying the `Default` encoding will use the system's current ANSI code page, which is not UTF-8, as I required. – sourcenouveau May 06 '15 at 13:21
  • This does seem to work for me, at least for Export-CSV. If you open the resulting file in a proper editor, the file encoding is UTF-8 without BOM, and not Western Latin ISO 9 as I would have expected with ASCII – eythort Aug 05 '16 at 11:00
  • Many editors open the file as UTF-8 if they can't detect the encoding. – emptyother Jul 22 '17 at 09:40