There's good information in some of the answers, but let me try to provide a systematic summary and to address your own attempts:
In order to preserve insignificant whitespace in an XML document (System.Xml.XmlDocument
, [xml]
in PowerShell) read from a file, .PreserveWhitespace = $true
must be set, notably before loading content, such as from a file.
However, with a file you must also ensure that the file is read (loaded) and saved correctly:
To put it all together:
# Construct an empty [xml] instance.
$xml = [xml]::new() # In PSv4-: New-Object xml
# Instruct it to preserve whitespace when content is loaded later,
# as well as on saving.
$xml.PreserveWhitespace = $true
# Load the document from your file
# Note the use of Convert-Path to ensure that a *full* path is used.
$xmlFileFullPath = Convert-Path -LiteralPath data.xml
$xml.Load($xmlFileFullPath)
# ... modify $xml
# Save the modified document back to the file.
# Note: If you were to write to a *different* file, again be
# sure to specify a *full* path.
$xml.Save($xmlFileFullPath)
As for what you tried:
Re Attempt 1
$xml = [xml](get-content data.xml)
Because Get-Content
by default reads a text file line by line, so that information about the original newlines is invariably lost in the process.
Therefore, this method of loading an XML file is fundamentally unsuited to preserving the original whitespace in the file, as you've discovered yourself. However, as discussed, [xml] [System.IO.File]::ReadAllText("data.xml")
and [xml] (Get-Content -Raw -LiteralPath data.xml)
aren't fully robust either - use .Load()
instead.
Apart from that, preserving the original whitespace requires opt-in, which the idiom ([xml] (<# XML text, possibly from a file #>
) doesn't support, given that the [xml]
instance's
.PreserveWhitespace
property must be set to $true
before content is loaded.
set-content data.xml [String]$xml
As discussed, Set-Content
also isn't a robust way to save an XML document to a file. Even if no encoding problems happen to arise, the absence of -NoNewLine
(v5+) would result in a platform-native newline getting appended to the file, which may be at odds with the file's original newline format.
Additionally, [String]$xml
does not return the XML text of an [xml]
instance - you need .OuterXml
for that.
Re Attempt 2
$xml.PreserveWhitespace = true
This is a simple syntax problem:
PowerShell's Boolean ([bool]
) constants are $true
and $false
, so true
should be $true
Neglecting to use $
does not cause a syntax error, however: it causes true
to be interpreted as a command (a PowerShell cmdlet, script, function, external program, ...), and if there is none by that name,[2] an unrecognized-command error is emitted that terminates the statement, so that no property assignment takes place.
Re Attempt 3
Result: [regex]::replace
messes up the line endings
No: [regex]::Replace()
has no effect on line endings (newlines).
(As an aside: consider using PowerShell's -replace
operator instead.)
Instead, the problem - loss of newlines due to creating an array of lines - occurred earlier, in your Get-Content
call, as previously discussed.
[1] It is only fully equivalent in PowerShell (Core) 7+, which - like .NET APIs - defaults to (BOM-less) UTF-8. Windows PowerShell, by contrast, assumes ANSI encoding when reading a file without a BOM.
[2] On Unix-like platform, there actually is an external program named true
, which produces no output, which - when PowerShell coerces that to a [bool]
- becomes $false
.