4

I built an XML object of type System.Xml.XmlDocument.

$scheme.gettype()
IsPublic IsSerial Name BaseType                                                         
-------- -------- ---- --------                                                         
True     False    XmlDocument System.Xml.XmlNode 

I use the method save() to save it to a file.

$scheme.save()

This saves the file in format UTF-8 with BOM. The BOM causes issues with other scripts down the line.

When we open the XML file in Notepad++ and save it as UTF-8 (without the BOM), other scripts down the line don't have a problem. So I've been asked to save the script without the BOM.

The MS documentation for the save method states:

The value of the encoding attribute is taken from the XmlDeclaration.Encoding property. If the XmlDocument does not have an XmlDeclaration, or if the XmlDeclaration does not have an encoding attribute, the saved document will not have one either.

The MS documentation on XmlDeclaration lists encoding properties of UTF-8, UTF-16 and others. It does not mention a BOM.

Does the XmlDeclaration have an encoding property that leaves out the BOM?

PS. This behavior is identical in Powershell 5 and Powershell 7.

Bagheera
  • 1,358
  • 4
  • 22
  • 35

2 Answers2

5

Unfortunately, the presence of an explicit encoding="utf-8" attribute in the declaration of an XML document causes .NET's [xml] (System.Xml.XmlDocument) type to .Save() the document, when given a file path, to an UTF-8-encoded file with BOM, which can indeed cause problems (even though it shouldn't[1]).

A request to change this has been green-lighted in principle, but is not yet implemented as of .NET 6.0 (due to a larger discussion about changing [System.Text.Encoding]::UTF8 to not use a BOM, in which case .Save() would automatically not create a BOM anymore either).

Somewhat ironically, the absence of an encoding attribute causes .Save() to create UTF-8-encoded files without a BOM.

A simple solution is therefore to remove the encoding attribute[2]; e.g.:

# Create a sample XML document:
$xmlDoc = [xml] '<?xml version="1.0" encoding="utf-8"?><foo>bar</foo>'

# Remove the 'encoding' attribute from the declaration.
# Without this, the .Save() method below would create a UTF-8 file *with* BOM.
$xmlDoc.ChildNodes[0].Encoding = $null

# Now, saving produces a UTf-8 file *without* a BOM.
$xmlDoc.Save("$PWD/out.xml")

[1] Per the XML W3C Recommendation: "entities encoded in UTF-8 MAY begin with the Byte Order Mark" [BOM].

[2] This is safe to do, because the XML W3C Recommendation effectively mandates UTF-8 as the default in the absence of both a BOM and an encoding attribute.

mklement0
  • 382,024
  • 64
  • 607
  • 775
4

As BACON explains in the comments, the string value of the Encoding attribute in the XML declaration doesn't have any bearing on how the file containing the document is encoded.

You can control this by creating either a StreamWriter or an XmlWriter with a non-BOM UTF8Encoding, then pass that to Save($writer):

$filename = Resolve-Path path\to\output.xml

# Create UTF8Encoding instance, sans BOM
$encoding = [System.Text.UTF8Encoding]::new($false)

# Create StreamWriter instance
$writer = [System.IO.StreamWriter]::new($filename, $false, $encoding)

# Save using (either) writer
$scheme.Save($writer)

# Dispose of writer
$writer.Dispose()

Alternatively use an [XmlWriter]:

# XmlWriter Example
$writer = [System.Xml.XmlWriter]::Create($filename, @{ Encoding = $encoding })

The second argument is an [XmlWriterSettings] object, through which we can exercise greater control over formatting options in addition to explicitly set encoding:

$settings = [System.Xml.XmlWriterSettings]@{
  Encoding = $encoding
  Indent = $true
  NewLineOnAttributes = $true
}
$writer = [System.Xml.XmlWriter]::Create($filename, $settings)

#  <?xml version="1.0" encoding="utf-8"?>
#  <Config>
#    <Group
#      name="PropertyGroup">
#      <Property
#        id="1"
#        value="Foo" />
#      <Property
#        id="2"
#        value="Bar"
#        exclude="false" />
#    </Group>
#  </Config>
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206