The following Perl regular expression search string with an empty replace string can be used in a text editor like UltraEdit if the XML file is well formatted with the elements on separate lines and correct indentations as in the example to delete all most outer bPoint
elements from the file.
^([\t ]*)<bPoint.*?>[\s\S]+?\n\1</bPoint>[\t ]*(?:\r?\n|$)
UltraEdit has the command XML Convert to CR/LFs in menu Format to get a well formatted XML file.
Expression explanation:
^
... start search at beginning of a line.
([\t ]*)
... find 0 or more tabs or spaces at beginning of a line and mark them for back referencing.
<bPoint.*?>
... find start tag of element bPoint
.
[\s\S]+?
... find any character including line terminators 1 or more times non greedy.
\n\1</bPoint>
... find a line-feed, exactly the same tabs and/or spaces as at beginning the found string and end tag of element bPoint
. The exact number of tabs/spaces from beginning of line to end tag is the reason why the inner bPoint
elements are ignored by this search string.
[\t ]*(?:\r?\n|$)
... find 0 or more tabs or spaces and an optional carriage return and a line-feed OR end of file in case of element bPoint
ends on last line of file with no line terminator.
JavaScript:
In a JavaScript script with the well formatted XML block being hold in a string variable the code to remove all most outer bPoint
elements would be:
// String variable sXmlBlock contains the well formatted XML block.
do
{
var nXmlBlockLength = sXmlBlock.length;
sXmlBlock = sXmlBlock.replace(/(^|\n)([\t ]*)<bPoint.*?>[\s\S]+?\n\2<\/bPoint>[\t ]*(?:\r?\n|$)/g,"$1");
}
while ((sXmlBlock.length < nXmlBlockLength) && (sXmlBlock.length > 0));
The loop is necessary in case of multiple bPoint
elements are in series as in the example and all of them should be removed from the XML block.
For this input XML block:
<tag>value 1</tag>
<bPoint id="1" >
<bLabel>
<text></text>
</bLabel>
<content src="p112" />
<bPoint id="2">
<bLabel>
<text>xxx</text>
</bLabel>
<content src="p1123" />
</bPoint>
</bPoint>
<bPoint id="bPoint-2" >
<bLabel>
<text>xxx</text>
</bLabel>
<content src="p1124" />
</bPoint>
<tag>value 2</tag>
<bPoint id="bPoint-3" >
<bLabel>
<text>xxx</text>
</bLabel>
<content src="p1125" />
</bPoint>
the script code produces as output:
<tag>value 1</tag>
<tag>value 2</tag>
It is of course possible to modify the search expression to remove just a specific bPoint
element based on a criteria. But the question is not clear enough what should be removed and what is the criteria for the removal. An example showing us input to script and output of script with explaining the criteria(s) would have helped here a lot to understand the requirements for the replacement task.