19

What's the easiest way to convert XML from UTF16 to a UTF8 encoded file?

David Gardiner
  • 16,892
  • 20
  • 80
  • 117

3 Answers3

16

Well, I guess the easiest way is to just not care about whether the file is XML or not and simply convert:

Get-Content file.foo -Encoding Unicode | Set-Content -Encoding UTF8 newfile.foo

This will only work for XML when there is no

<?xml version="1.0" encoding="UTF-16"?>

line.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • 6
    If you want to do it without creating a new file, you can wrap the get-content in parenthesis: (Get-Content File.foo) | Set-Content -Encoding UTF8 File.foo – Jaykul Jun 12 '11 at 03:20
  • How do you do this for files in a directory and subdirectories? – stormwild Sep 01 '12 at 09:02
  • 2
    `gci -rec -fi * | %{(gc $_ -enc unicode) | set-content -enc utf8 $_.fullname}`. Fairly straightforward, actually. – Joey Sep 01 '12 at 13:31
  • @Joey, a small correction on your powershell script... `gci -rec -fi * | %{(gc $_.fullname -enc unicode) | set-content -enc utf8 $_.fullname}` – Tim Friesen Oct 09 '12 at 19:42
  • 1
    No need of using `FullName` there. `Get-Content` knows how to deal with a `FileInfo`. – Joey Oct 09 '12 at 19:44
  • @Joey, unfortunately for me it was complaining that it could not find the path. I think it must be converting the FileInfo object to a string. `Get-Content : Cannot find path 'C:\WorkingFolder\FileName.txt' because it does not exist. At line:1 char:26 + gci -rec -fi *.txt | %{(gc <<<< $_ -enc ascii) | set-content -enc utf8 $_.fullname} + CategoryInfo : ObjectNotFound: (C:\WorkingFolder\FileName.txt:String) [Get-Content], ItemNotFoundException + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand` FileName.txt was in a subfolder of C:\WorkingFolder. – Tim Friesen Oct 09 '12 at 20:33
  • does this get you: `UTF8 with BOM`? see here: https://stackoverflow.com/q/5596982/1747983 – Tilo Aug 10 '22 at 21:49
16

This may not be the most optimal, but it works. Simply load the xml and push it back out to a file. the xml heading is lost though, so this has to be re-added.

$files = get-ChildItem "*.xml"
foreach ( $file in $files )
{
    [System.Xml.XmlDocument]$doc = new-object System.Xml.XmlDocument;
    $doc.set_PreserveWhiteSpace( $true );
    $doc.Load( $file );

    $root = $doc.get_DocumentElement();
    $xml = $root.get_outerXml();
    $xml = '<?xml version="1.0" encoding="utf-8"?>' + $xml

    $newFile = $file.Name + ".new"
    Set-Content -Encoding UTF8 $newFile $xml;
}
Joey
  • 344,408
  • 85
  • 689
  • 683
Ben Laan
  • 2,607
  • 3
  • 29
  • 30
9

Try this solution that uses a XmlWriter:

$encoding="UTF-8" # most encoding should work
$files = get-ChildItem "*.xml"
foreach ( $file in $files )
{
    [xml] $xmlDoc = get-content $file
    $xmlDoc.xml = $($xmlDoc.CreateXmlDeclaration("1.0",$encoding,"")).Value
    $xmlDoc.save($file.FullName)      
}

You may want to look at XMLDocument for more explanation on CreateXmlDeclaration.

SchmitzIT
  • 9,227
  • 9
  • 65
  • 92
LMA1980
  • 91
  • 1
  • 6
  • Thank you very much for caring to provide such a brief, an technically better, answer to such an old question! – gimpf Jan 30 '13 at 12:05
  • I had to get it done and I found this solution even before seeing this question. I felt it was normal to offer it. With little effort someone can even use it to copy while converting the encoding of the files. Regards. – LMA1980 Jan 31 '13 at 18:56