1

I have simple xml file wherein when just use the get-content and save the xml file back. Its corrupting the xml file and getting unusable. Your help/suggestion/solutions are most welcome.

$xmlfile = 'C:\Test\stack.xml'
[xml]$xmlcontent = (Get-Content $xmlfile)
$xmlcontent.Save($xmlfile)

Below is my sample xml file which im using here in my case to run my powershell script mentioned above. You could save the below xml in to file for refernce.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE fccconfig SYSTEM "fccconfig.dtd">

<fccconfig version="1.2.3">
   <fccdefaults>

      <!-- general -->
      <property name="FCC_CacheLocation" value="C:/Users/Public/sometestCache" overridable="true"/>
      <property name="FCC_LogFile" value="C:/Users/Public/sometestfile.log" overridable="true" />
      <!-- external site access definition -->
      <!-- <site id="013B998A65427E" overridable="true"> -->
         <!-- <parentfsc address="localhost:4567" priority="0"/> -->
         <!-- <parentfsc address="myserverhost:4444" priority="1"/> -->
         <!-- <assignment mode="parentfsc" /> -->
      <!-- </site> -->

      <site id="-987654321" overridable="true">
         <parentfsc address="http://testlink:12345/" priority="0" />
      </site>
      <!--__ANT_MARK__-->

   </fccdefaults>

   <!-- default parentfsc - this is a marker that will be overwritten by the installer -->
   <parentfsc address="address1.com:2020" priority="0" transport="lan"/>
   <parentfsc address="address1.com:2020" priority="1" transport="lan"/>

</fccconfig>

In the 2nd of the xml after Running the Script and unknown chracters are getting added [] . And also the spacing between the xml file's gets changed. Below is snippet of the difference in files. enter image description here

mklement0
  • 382,024
  • 64
  • 607
  • 775

2 Answers2

0

Text will always be better than pictures. The square brackets added on line 2 seem to be a normal xml thing: How to get rid of square brackets[] after editing and saving an XML file

Note that the file will be saved with the encoding on line 1. In this case, a utf8 bom will be added if it wasn't there.

Possibly prevent reformatting? Writing in xml does not keep the formatting?

js2010
  • 23,033
  • 6
  • 64
  • 66
  • Hi @js2010 Sorry didnt understandwhat your trying to explain me. I am new to powershell. Also i checked that link as well , I think that has been performed in C#. I am currently performing the save operation in the powershell.In My case i would like to avoid the [] braces in the xml file and also if you check, the indentation has been compltely changes as well.Which in my i would like to avoid – user1539205 Mar 31 '20 at 15:00
  • I don't think you can avoid the re-indenting. In a sense the save is making it "pretty". I don't know how to get rid of the brackets. I think it would be fine to leave it in. – js2010 Mar 31 '20 at 15:29
0

It's corrupting the xml file and getting unusable.

There's no corruption - the file is still be readable by an XML processor and has the same content, but aspect of its formatting have changed, due to (default) behaviors built into the System.Xml.XmlDocument class (accessible via type accelerator [xml] in PowerShell):

  • (a) The non-significant whitespace in the input XML text was trimmed on reading, and on saving the elements were pretty-printed (automatically spread across multiple lines with indentation); as a result, the visual structure of the document changed (but not its content).

  • (b) [] was appended to the end of the document-type declaration (<!DOCTYPE ...[]>) to denote an empty internal subset, which is apparently invariably added when the document is saved to a file - again, there's no change in content from an XML-parsing perspective.

  • (c) The saved file uses character encoding UTF-8 with a BOM - irrespective of whether the input file had a BOM or not; the reason is the encoding="UTF-8" attribute in the XML declaration, which (unfortunately) causes the .Save() method to use a BOM; while redundant, it again shouldn't pose a problem for any XML parser.

    • Unfortunately, this redundant behavior won't change, in the interest of backward compatibility - see this GitHub issue.

Addressing (a) - preserving the original visual structure - is fairly straightforward:

$xmlfile = 'C:\Test\stack.xml'

# Create an empty XmlDocument instance...
$xmlcontent = [xml]::new()
# ... and tell it to preserve non-significant whitespace when 
#     reading from / writing to a file.
$xmlcontent.PreserveWhitespace = $true

# Load the XML text from the file.
$xmlContent.Load($xmlFile)

# ...

# Save it back to the file, with the original whitespace preserved.
$xmlcontent.Save($xmlfile)

Note: The above uses a full file path anyway, but it's important to always do that when passing paths to .NET methods, because .NET's working directory typically differs from PowerShell's.


If you really need to address (b) and (c) as well, run the following after the above:

[IO.File]::WriteAllText(
  $xmlfile,
  ((Get-Content -Raw $xmlfile) -replace '(?m)(?<=^<!DOCTYPE .+)\[\](?=>)')
)
mklement0
  • 382,024
  • 64
  • 607
  • 775