1

I'm updating some xml files with powershell they originate form a Linux machine. Once I'm done updating the file is all messed-up with extra spaces etc not good I can't use it.

Changes from:
UNIX )(LF) UTF-8

To
Windows (CR LF) UTF-8-BOM

Does anyone know how to keep the same format as I save back.

$myfile = "C:\hrfeed\output\$file"
$stringToXML.save($myfile)

Thank you

Evo
  • 41
  • 5
  • I tried a few more things. dos2unix to convert the file and also tried the other way around since it was UNIX I tried unix2dos. But no difference. As soon as the file is read and written by powershell I see spaces and extra double quotation appear squire brackets appear. @Theo – Evo Jun 06 '20 at 02:48
  • those double quotes at first line was single once and the squire brackets weren't there before . + the fact the the format is now Windows (CR LF) – Evo Jun 06 '20 at 02:55

1 Answers1

0

If you want to save the xml as UTF-8 without BOM and have unix style newline characters \n instead of \r\n, you cannot use the standard Save() method on Windows and need to create a function yourself to do that.

Using your previous question as example, you could do this:

[xml]$xmldata = @"
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Identity PUBLIC "point.dtd" "point.dtd"[]>
<Identity  created="1525465321820" name="Onboarding - GUI - External">
    <Attributes>
    <Map>
        <entry key="displayName" value="Onboarding - GUI " />
        <entry key="firstname" value="Z Orphaned ID" />
    </Map>
    </Attributes>
</Identity>
"@

# do something with the xml data

To save the xml to file with UNIX style newlines and also in UTF-8 No BOM encoding, you can use this function:

function Out-UnixXml {
    [CmdletBinding()]
    param(
        [Parameter(ValueFromPipeline = $true, Mandatory = $true, Position = 0)]
        [xml]$xml,

        [Parameter(ValueFromPipeline = $true, Mandatory = $true, Position = 1)]
        [Alias('FilePath')]
        [string]$Path
    )
    try {
        $settings = [System.Xml.XmlWriterSettings]::new()
        $settings.Indent       = $true                                     # defaults to $false
        $settings.NewLineChars = "`n"                                      # defaults to "`r`n"
        $settings.Encoding     = [System.Text.UTF8Encoding]::new($false)   # $false means No BOM

        $xmlWriter = [System.Xml.XmlWriter]::Create($Path, $settings)

        $xml.WriteTo($xmlWriter)
        $xmlWriter.Flush()
    }
    finally {
        # cleanup
        if ($xmlWriter) { $xmlWriter.Dispose() }
    }
}

And use it like this instead of $xmldata.Save('C:\somefile.xml')

Out-UnixXml $xmldata 'C:\somefile.xml'

As for the square brackets in the DOCTYPE declaration. see XmlDocument.Save() inserts empty square brackets in doctype declaration

Theo
  • 57,719
  • 8
  • 24
  • 41
  • @Evo ..not sure what you mean here.. If you want to loop over a number of xml files, why not `Get-ChildItem -Filter '*.xml' -File | ForEach-Object { [xml]$xmldata = Get-Content -Path $_.FullName -Raw; #code to update the xml; Out-UnixXml $xmldata $_.FullName}` ? – Theo Jun 09 '20 at 13:27
  • Hi Theo, Thanks again for the response. Actually in your example the file is already messed up. I just didn't realize until now. Example Now: BEFORE . NOW: BEFORE: . Also if you look down you will see spaces everywhere. Your example worked in terms of format saving but all the extra characters showed up the same way. Is there a way to post all the code it's too much if I try to paste it? Thx – Evo Jun 09 '20 at 13:31
  • Sorry Still trying to figure out this website ctrl+ k doesn't work for me on comments. – Evo Jun 09 '20 at 13:32
  • I just tried to read write without making any changes to the file and it still add the spaces, squire brakes, double quotes. – Evo Jun 09 '20 at 14:09
  • @Evo Why do you want single quotes in the XML? Usually they are double-quotes and any xml reader should be able to handle those. See [Quotes in XML. Single or double?](https://stackoverflow.com/questions/6800467/quotes-in-xml-single-or-double). Also, I have no idea what you mean by _add the spaces_. What spaces? If you want a compressed XML, just do `$settings.Indent = $false` – Theo Jun 09 '20 at 15:51
  • Thanks @Theo, I'm not sure but I do know that the system I'm trying to import the file to won't take it anymore. It's broken after poweshell has read write it back. – Evo Jun 09 '20 at 20:28
  • @Evo Then I think you need to look for the errors of that system.. You can always check if **your** xml is valid, for instance [here](https://www.w3schools.com/xml/xml_validator.asp). I have added a link at the bottom of my answer about the square brackets. Please read the answers given there. – Theo Jun 09 '20 at 20:42
  • Hi @Theo, I don't know why but as soon as Its open and read/write by powershell the system will no longer import the file. So it definitely is something to do with powershell, I've seen piton also brake it something happens to the file. I don't know what all I see is spaces double quotes, and the squire brackets as difference. – Evo Jun 10 '20 at 20:17
  • @Evo Remember that I cannot see your screen and have no knowledge of the system you use that is reluctant to load the xml. Please post a **new question** and give more (much more) details. [1] Add the source XML, [2] explain what the "system" is you are using that is not able to load xml, [3] why it is not able to handle double-quoted values and attribute names, [4] why it cannot handle the square brackets in the DOCTYPE declaration, [5] why it cannot handle UTF-8 with BOM and [6] why it will only accept unix style Newline characters. – Theo Jun 11 '20 at 08:55
  • @Evo **ANY** software dealing with XML should be able to handle all of that, as long as the xml you are feeding it is Well-Formed. Also explain the thing about the _spaces_ you see in detail (add example). When I test, there are no spaces where they should'nt be.. Do you get any error messages, either from PowerShell or the system that needs to load the xml? If so, please add these errors **in full** in your new question aswell. – Theo Jun 11 '20 at 08:56
  • Thanks @Theo. I think I will just leave this one for now. Already pass dead line. Thanks for the help. – Evo Jun 12 '20 at 01:50