I want to get rid of the xml-code from within more than 100 xml-files. I want to use PowerShell. Here is one sample file:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="../../../helpproject.xsl" ?><topic
template="Default" lasteditedby="liliya" xmlns:xsi="http://www.w3.org
/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../..
/../helpproject.xsd">
<title translate="true">Passwörter verwalten</title>
<body>
<header>
<para styleclass="Heading1"><text styleclass="Heading1"
translate="true">Passwörter verwalten</text></para>
</header>
<para styleclass="Normal"><table styleclass="container" rowcount="3"
colcount="2" style="width:970px;">
<tr style="vertical-align:top">
<td style="width:50%;">
<para styleclass="H1"><text styleclass="H1"
translate="true">Passwörter verwalten</text></para>
</td>
<td style="width:50%;">
<para styleclass="Image"><image src="manage_passwords.PNG"
scale="100.00%" styleclass="Image"><title translate="true">Passwörter
verwalten</title></image></para>
</td>
</tr>
</table></para>
<para styleclass="txt"/>
In Notepad++ after regex of <.+?> and ^\s+ I see just the text!
With this script I copy the originals (to leave them unchanged) to a single folder and then O just want to eliminate the xml-tags:
Get-ChildItem -Path "C:\Users\cas\Documents\Wurzel_XML\" -Recurse |
Where-Object Name -like "*.xml" |
Copy-Item -Destination "C:\Users\cas\Documents\check_xml\"
$newText = ($newText -replace "<.*?>", "").trim()|?{$_ -ne ''}
Get-ChildItem -Path "C:\Users\cas\Documents\check_xml\" |
Set-Content -Value $newText
But after that all the files are completely empty?
I previously tried
$newText = ($newText -replace "(?ms)^\s+<.*?</.*?>", "")
Get-ChildItem -Path "C:\Users\cas\Documents\check_xml\" |
Set-Content -Value $newText
with the same result.
What do I wrong with that Regex?
Thanks in advance,
Gooly