I have several hundred XML files which i need to make a slight change to. I'm aware that i really should be using XSLT to make batch changes to XML structure, but i think some quick and dirty Regex will do what i need much faster than me working out the XSLT. At least i thought that before spending hours trying to get the Regex right!!
Take the below example, what i have is various lists <seqlist>
which contain <items>
elements for each item in the list. Each <item>
element contains a <para>
element which has various ID attribute values. I want to remove those <para>
tags completely so that the <item>
contains the actual text.
So from: <seqlist><item><para id="1.1">Some text here.</para></item></seqlist>
To: <seqlist><item>Some text here.</item></seqlist>
This is fairly strightforward in itself i can simply do:
Regex: <item><para id="([^\"]*)">
Replace: <item>
Then remove the redundant closing tags by doing a simple find replace
Find: </para></item>
Replace: </item>
.
However, as can be seen from the example below, some <item>
elements in the list, contain another <seqlist>
nested within them, which contains further nested <item>
ad <para>
tags. This means the above find replace to remove the closing </para>
tag will result in the closing </para>
in the very last line in the example below being replaced too.
Basically what i need to say is: find </para></item>
and replace with </item>
UNLESS there is a opening <para>
element to the left of it.
The very last line of the example below explains it better. If i do the above Find & Replace the last </para>
will be removed and it will not parse.
Any ideas how to achive this please?
<seqlist>
<item><para id="p7.1"><emphasis>JRK Type 1</emphasis>: (NSP XX-XX-XXX-XXXX)
outputs:
<seqlist>
<item><para id="p7.1.1">12 V or 15 V,0-5A</para></item>
<item><para id="p7.1.2">12 V or 15 V,0-5A</para></item>
</seqlist></para>
<para>Both at 120 W maximum output power.</para><para>The outputs are isolated, permitting parallel or serial connection to provide power as required.</para></item>
<item><para id="p7.2"><emphasis>JRK Type 2:</emphasis> (NSN 6130-99-788-6945) outputs:</para>
<seqlist>
<item><para id="p7.2.1">5 V, 0 - 30 A</para></item>
<item><para id="p7.2.2">12 V, 0 - 0.5 A</para></item>
</seqlist><para>Both at 120 W maximum output power.</para>
<para>The 12 V outputs are measured with respect to a common 0 V line but these are isolated from the 5 V output.</para></item>
</seqlist>