2

I have two XML files, one with default names and values (named Test.xml) and the other one with just the default names (named document.xml). Goal is to replace the default names with the values - but only on first occurence.

Here is the Test.xml:

<XML-TEST>
    <MyText>Dies ist ein Test</MyText>
    <MyTexttwo>Dies ist noch ein Test</MyTexttwo>
</XML-TEST>

Here is the document.xml (pretty much at the end):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
    xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex"
    xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
    mc:Ignorable="w14 w15 w16se wp14">
  <w:body>
    <w:p w:rsidR="00E64ECE" w:rsidRDefault="00E64ECE" w:rsidP="00E64ECE">
      <w:proofErr w:type="spellStart" />
      <w:r>
        <w:t>MyText</w:t>
      </w:r>
      <w:proofErr w:type="spellEnd" />
    </w:p>
    <w:p w:rsidR="00D50239" w:rsidRPr="00E64ECE" w:rsidRDefault="00E64ECE" w:rsidP="00E64ECE">
      <w:r>
        <w:t>MyTexttwo</w:t>
      </w:r>
      <w:bookmarkStart w:id="0" w:name="_GoBack" />
      <w:bookmarkEnd w:id="0" />
    </w:p>
    <w:sectPr w:rsidR="00D50239" w:rsidRPr="00E64ECE">
      <w:pgSz w:w="11906" w:h="16838" />
      <w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
      <w:cols w:space="708" />
      <w:docGrid w:linePitch="360" />
    </w:sectPr>
  </w:body>
</w:document>

What am I doing with PowerShell?

  1. I save the Test.xml (the one with values) in a hashtable:

    PS> $XMLSourceHashtable
    
    Name         Value                                                                                                                                                                                                                                                                                                                                                             
    ----         -----                                                                                                                                                                                                                                                                                                                                                             
    MyText       Dies ist ein Test                                                                                                                                                                                                                                                                                                                                                 
    MyTexttwo    Dies ist noch ein Test
    
  2. Save document.xml into a variable $DocumentXml.

  3. Use foreach to replace what I need:

    foreach ($key in ($XMLSourceHashtable.GetEnumerator())) {
        # If one key.value is "false" replace the 1:1 name with Char
        if ($key | Where-Object {$_.Value -eq "false"}) {
            #$key.Name.Trim()
            #$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), "â˜")
        } elseif ($key | Where-Object {$_.Value -eq "true"}) {
            # If one key.value is "true" replace the 1:1 name with Char
            #$key.Name.Trim()
            #$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), "☒")
        } else {
            # Everything else needs to be replaced by value in hashtable
            #Write-Host $key.Name.Trim() "--------------" $key.Value.Trim()
            #$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), $key.Value.Trim())
        }
    }
    

The first two elseif are working fine and they should be not considered. It's the else which I'm concered about.

What happens?

The text is going to replace of course but the replace methode will do the following:

Values in the document.xml are being replaced like this:

"MyText" → "Dies ist ein Test"
"MyTexttwo" → Dies ist ein Testtwo"

but it should be:

"MyText" → "Dies ist ein Test"
"MyTexttwo" → Dies ist noch ein Test"

The point is, that "MyText" is being recognized in "MyTexttwo". Each "Name" is actual unique but its not handled like it's unique. I know that's possible to replace on first occurence but only with RegEx. But I can't convert the xml to regex and back again. Is there something else I can do?

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
J.Doe
  • 37
  • 5
  • 1
    Please take this general, universal advice. **NEVER, NEVER, NEVER** use string replace tools on XML source code. This is always the completely wrong thing to do. It's hard to explain how wrong that is, because it looks so easy to beginners. Simply settle on never doing that. Learn the right tools (in this case: XPath) and use them. – Tomalak Jul 25 '18 at 07:55
  • @Tomalak Point taken, edited my answer. Maybe you can put your warning inside your answer as Blockquote so it stand out better? – Theo Jul 25 '18 at 09:07
  • @Tomalak Noted! – J.Doe Jul 25 '18 at 10:18

2 Answers2

3

Your approach is much too complicated. Use XPath. In principle - load, modify, save:

$document = New-Object xml
$document.Load('Document.xml')

$element = $document.SelectSingleNode("//some/path")
$element.InnerText = "some new value"

$document.Save('Document_2.xml')

The only slight complication here is that you are dealing with a Word document, and they use XML namespaces (written as xmlns:foo="...namespace URI..." in the XML source), so you need to use namespaces, too (see: Using PowerShell, how do I add multiple namespaces (one of which is the default namespace)?):

$document = New-Object xml
$document.Load('Document.xml')

# use a namespace manager to register the w: namespace prefix
$namespaces = New-Object System.Xml.XmlNamespaceManager $document.NameTable
$namespaces.AddNamespace('w', 'http://schemas.openxmlformats.org/wordprocessingml/2006/main')

foreach ($item in $XMLSourceHashtable) {
    $searchText = $item.Name;
    $element = $document.SelectSingleNode("//w:t[.='$searchText']", $namespaces)
    $element.InnerText = $item.Value
}

$document.Save('Document_2.xml')

The "//w:t[.='$searchText']" will be interpolated into XPath expressions like //w:t[.='MyText'] - and this path will select all <w:t> elements in the input XML that have 'MyText' as their value. Using .SelectSingleNode() will return only the first of those, which seems to be what you want.

You can use .SelectNodes() and another foreach loop to edit all occurences:

foreach ($element in $document.SelectNodes("//w:t[.='$searchText']", $namespaces)) {
    $element.InnerText = $item.Value
}
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • I'm trying this - please standby – J.Doe Jul 25 '18 at 08:08
  • See my addition about namespace use, you will probably need this as well here. – Tomalak Jul 25 '18 at 08:11
  • @Ansgar Ahh, now that the XML has been formatted in the question it's much clearer. Yes, I think you are right. – Tomalak Jul 25 '18 at 08:17
  • I guess your way to solve it is the best one so far.. but my PS throws me an error on $namespaces.AddNamespace - "theres no methode AddNamespace" I have only "Add()" as option. – J.Doe Jul 25 '18 at 08:37
  • @J.Doe You are right, I used code from the linked answer and did not test it. I've corrected my answer. – Tomalak Jul 25 '18 at 08:40
  • it's working so far - Thanks so much it really cleard up my mind but for me this is a way more advanced level of PS. Anyhow I still have one issue - how can i Save the XML using a Path Variable? Such as: $Path = "C:\Test\document.xml" to overwrite the source document.xml // $Document.save($Path) – J.Doe Jul 25 '18 at 09:33
  • Something else I have to say. I could not use the following: ($item in $XMLSourceHashtable) I had to change it to ( $item in ($XMLSourceHashtable.GetEnumerator())) in order to work correctly – J.Doe Jul 25 '18 at 10:20
  • Maybe you need to update your Powershell version? `foreach ($item in @{a=1;b=2}) { $item }` works perfectly fine for me, it's possible they included this syntax later. [I am using version 5.1](https://stackoverflow.com/q/1825585). Saving with a path variable should work just fine. There is nothing wrong with `$document.Save($path)`. – Tomalak Jul 25 '18 at 11:39
  • I'm using 5.1.15063.786 can't tell you whats wrong. – J.Doe Jul 25 '18 at 12:26
  • Try the foreach loop in my previous comment. If that works, then `$XMLSourceHashtable` isn't *actually* a hashtable. What does `$XMLSourceHashtable.GetType()` give you? – Tomalak Jul 25 '18 at 12:28
  • OrderedDictionary - System.Object – J.Doe Jul 25 '18 at 13:29
  • Ah, I see, so it's not really a hashtable, that explains things. Well, `GetEnumerator()` works, keep using it. – Tomalak Jul 25 '18 at 14:11
  • @J.Doe Alternatively you can explicitly convert it: `$XMLSourceHashtable = [hashtable]$XMLSourceHashtable`. After that it will start acting normally. It depends on how you create the variable, so you can could change that part, too. – Tomalak Jul 25 '18 at 14:23
-1

Although the advise Tomalak gave to NEVER use string replacement in XML is good advise, here's an answer to your question The point is, that "MyText" is being recognized in "MyTexttwo". Each "Name" is actual unique but its not handled like it's unique

The Replace method you use does not match the the WHOLE string. "MyTextTwo" starts with "MyText", so in your function that part of the name is replaced. "MyTextTwo" then no longer exists.

In order to do a replace only if the complete string matches and not just part of it. If you still want to use string replacement, I would suggest:

$nameToReplace = $key.Name.Trim()
$DocumentXml.InnerXml = $DocumentXml.InnerXml -replace "\A$nameToReplace\z", $key.Value.Trim()

The \A and \z symbols are anchors to tell the regex replace the string must be exactly what you give it. (positional asserts)

If you also need to be sure that the replacement only takes place if the casing also matches, you can use

$nameToReplace = $key.Name.Trim()
$DocumentXml.InnerXml = $DocumentXml.InnerXml -creplace "\A$nameToReplace\z", $key.Value.Trim()
Theo
  • 57,719
  • 8
  • 24
  • 41
  • 1
    Don't use regex or string replace on XML. That's a horrible thing to do. – Tomalak Jul 25 '18 at 07:56
  • Hello Theo, that's not working - unfortunalty until I set $key.Name into quotes "" in any kind of way, the variable will not be resolved any more. Therefore no changes will be made in the document.xml. – J.Doe Jul 25 '18 at 08:07