37

I want to read in an XML file and modify an element then save it back to the file. What is the best way to do this while preserving the format and also keep matching Line terminator (CRLF vs LF)?

Here is what I have but it doesn't do that:

$xml = [xml]([System.IO.File]::ReadAllText($fileName))
$xml.PreserveWhitespace = $true
# Change some element
$xml.Save($fileName)

The problem is that extra new lines (aka empty lines in the xml) are removed and after I have mixed LF and CRLF.

Matthew M. Osborn
  • 4,673
  • 4
  • 25
  • 26
  • 1
    What do you mean by `preserving the format`? – manojlds Nov 17 '11 at 00:34
  • 1
    It probably won't make a difference, but have you tried `$xml = [xml](Get-Content $filename)` instead? Otherwise you might have to use the native .NET XmlDocument class and methods to load, edit, and save the file. – Ryan Nov 17 '11 at 00:34
  • 3
    @manojids I want to preserver whitespace, newlines, tabs, etc. – Matthew M. Osborn Nov 17 '11 at 00:38
  • 1
    @Ryan Yea I have tried as well still the same problem. – Matthew M. Osborn Nov 17 '11 at 00:39
  • 1
    As an aside: `[xml]([System.IO.File]::ReadAllText($fileName))` and `[xml](Get-Content $filename)` are to be avoided, because they can result in misinterpretation of the XML document's character encoding; use `($xml = [xml]::new()).Load((Convert-Path -LiteralPath $fileName))` instead - see [this answer](https://stackoverflow.com/a/71848130/45375) for details. – mklement0 Mar 18 '23 at 14:53

5 Answers5

59

You can use the PowerShell [xml] object and set $xml.PreserveWhitespace = $true, or do the same thing using .NET XmlDocument:

# NOTE: Full path to file is *highly* recommended
$f = Convert-Path '.\xml_test.xml'

# Using .NET XmlDocument
$xml = New-Object System.Xml.XmlDocument
$xml.PreserveWhitespace = $true

# Or using PS [xml] (older PowerShell versions may need to use psbase)
$xml = New-Object xml
$xml.PreserveWhitespace = $true
#$xml.psbase.PreserveWhitespace = $true  # Older PS versions

# Load with preserve setting
$xml.Load($f)
$n = $xml.SelectSingleNode('//file')
$n.InnerText = 'b'
$xml.Save($f)

Just make sure to set PreserveWhitespace before calling XmlDocument.Load or XmlDocument.LoadXml.

NOTE: This does not preserve white space between XML attributes! White space in XML attributes seem to be preserved, but not between. The documentation talks about preserving "white space nodes" (node.NodeType = System.Xml.XmlNodeType.Whitespace) and not attributes.

Ryan
  • 7,835
  • 2
  • 29
  • 36
  • 2
    Very nice -- although I didn't have to use "psbase". – Gerard ONeill Aug 28 '13 at 01:08
  • 2
    This does not preserve newlines between attributes. – jpmc26 Jun 29 '16 at 17:03
  • @jpmc26 I was not aware of that white space between attributes are not preserved. I did a quick test as well and it did *not* preserve white space between XML attributes. It did preserve white space in attribute values as far as I could tell though. – Ryan Jul 05 '16 at 16:48
  • Thanks for checking. Yes, I was talking about whitespace outside of the attribute values. I don't know of any better options, so the answer is still good and legitimate. I just wanted to note a shortcoming of it. – jpmc26 Jul 05 '16 at 16:50
  • How to achieve this in attributes ? How to overcome this shortcoming? – raj Mar 15 '18 at 19:40
  • 1
    @raj, as far as I can tell, this is a .NET limitation and I do not know of a work-around. – Ryan Mar 21 '18 at 16:23
  • 1
    @mklement0 Good point. I've experienced issues with current directory differing between .NET and PowerShell in the past. Full path is highly recommended. – Ryan Mar 20 '23 at 13:28
11

If you would like to correct the CRLF that gets transformed to LF for text nodes after you call the Save method on the XmlDocument you can use a XmlWriterSettings instance. Uses the same XmlWriter as MilesDavies192s answer but also changes encoding to utf-8 and keeps indentation.

$xml = [xml]([System.IO.File]::ReadAllText($fileName))
$xml.PreserveWhitespace = $true

# Change some element

#Settings object will instruct how the xml elements are written to the file
$settings = New-Object System.Xml.XmlWriterSettings
$settings.Indent = $true
#NewLineChars will affect all newlines
$settings.NewLineChars ="`r`n"
#Set an optional encoding, UTF-8 is the most used (without BOM)
$settings.Encoding = New-Object System.Text.UTF8Encoding( $false )

$w = [System.Xml.XmlWriter]::Create($fileName, $settings)
try{
    $xml.Save( $w )
} finally{
    $w.Dispose()
}
Dan
  • 778
  • 7
  • 18
  • OPs question was _preserving the format and also keep matching CR and LF_. There are cases where I don't want to touch parts of a file even if it means keeping mix of LF and CRLF. Using settings in a writer as you suggested would change all to predefined format. But that's what I actually wanted. so, thanks. – papo Jun 25 '18 at 21:26
  • While it's good to know how to control the newline format on writing, it isn't necessary to solve the OP's problem. You're actually repeating the OP's mistake: setting `.PreserveWhitespace = $true` _after_ having read the file, which is too late. – mklement0 Mar 18 '23 at 15:03
6

When reading xml the empty lines ignored by default, in order to preserve them you can change PreserveWhitespace property before reading the file:

Create XmlDocument object and configure PreserveWhitespace:

$xmlDoc = [xml]::new()
$xmlDoc.PreserveWhitespace = $true

Load the document:

$xmlDoc.Load($myFilePath)

or

$xmlDoc.LoadXml($(Get-Content $myFilePath -Raw))
arielhad
  • 1,753
  • 15
  • 12
  • 1
    This answer actually explains _why_ the OPs provided code did not work as expected. – CodeFox Dec 01 '21 at 07:50
  • Kudos for pointing out the real problem clearly, but it's not a good idea to suggest `$xmlDoc.LoadXml($(Get-Content $myFilePath -Raw))`, because it delegates determining the file's character encoding to `Get-Content`, which isn't XML-aware. – mklement0 Mar 18 '23 at 15:05
2

If you save using an XmlWriter the default options are to indent with two spaces and to replace the line endings with CR/LF. You can configure these options after creating the writer or create the writer with an XmlSettings object configured with your needs.

    $fileXML = New-Object System.Xml.XmlDocument

    # Try and read the file as XML. Let the errors go if it's not.
    [void]$fileXML.Load($file)

    $writerXML = [System.Xml.XmlWriter]::Create($file)
    $fileXML.Save($writerXML)
MilesDavies192
  • 700
  • 6
  • 8
0

I don't see the line endings changing (\r\n), except the last one goes away. However, the encoding goes from ASCII to UTF8 with BOM.

$a = get-content -raw file.xml
$a -replace '\r','r' -replace '\n','n'

<?xml version="1.0" encoding="utf-8"?>rn<Configuration>rn  <ViewDefinitions />rn</Configuration>rn

[xml]$b = get-content file.xml
$b.save('file.xml')

$a = get-content -raw file.xml
$a -replace '\r','r' -replace '\n','n'

<?xml version="1.0" encoding="utf-8"?>rn<Configuration>rn  <ViewDefinitions />rn</Configuration>

# https://gist.github.com/jpoehls/2406504
get-fileencoding file.xml

UTF8
js2010
  • 23,033
  • 6
  • 64
  • 66
  • The line endings _can_ change when opt-in whitespace preservation isn't used, because the _platform-native_ ones are then used on saving. So, for instance, a LF-only input file becomes a CRLF file on Windows. An ASCII-characters-only file is by definition also a UTF-8 file. That `.Save()` results in creation of a BOM is unfortunate - this may change in the future, in the context of a proposal to switch the `[Text.Encoding]::Utf8` singleton from with-BOM to no-BOM - see https://github.com/dotnet/runtime/issues/51353. – mklement0 Mar 18 '23 at 20:27