0

I need to remove all tags from xml document if a certain text is found.

Example:

<root-element>
    <tag-name first:line="some-value">bla-bla</tag-name>
    <tag-name second:line="some-value">bla-bla</tag-name>
    <tag-name third:line="some-value">bla-bla</tag-name>
    <tag-name first:line="some-value">bla-bla</tag-name>
    <tag-name second:line="some-value">bla-bla</tag-name>
</root-element>

So for each first:line into the XML document, I want to remove the whole tag.

2 Answers2

0

You'll need to use a xml parsing library.

I recommend lxml.

Then to build a xpath selector utilize a function string-length() on the text() property. This way it will select any element with text inside.

import lxml.etree as et

tree=et.fromstring(xml)

for bad in tree.xpath("//*[string-length(text()) > 0]"):
  bad.getparent().remove(bad)   

print(et.tostring(tree, pretty_print=True, xml_declaration=True))
João A. Veiga
  • 498
  • 3
  • 11
  • Thanks João A. Veiga. Where should I put "first:line" in the xpath while iterating? –  Dec 15 '21 at 22:54
  • I'm not sure I understood what you're trying to do, would you like to only delete the tags that have the attribute first:line? In that case Xpath would be: //*[string-length(text()) > 0 and @first:line]" – João A. Veiga Dec 16 '21 at 00:16
0

Here is how to do it via XSLT.

The XSLT is using a so called Identity Transform pattern.

I modified XML and removed bogus namespaces.

Input XML

<?xml version="1.0"?>
<root-element>
    <tag-name firstline="some-value">bla-bla</tag-name>
    <tag-name secondline="some-value">bla-bla</tag-name>
    <tag-name thirdline="some-value">bla-bla</tag-name>
    <tag-name firstline="some-value">bla-bla</tag-name>
    <tag-name secondline="some-value">bla-bla</tag-name>
</root-element>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[@firstline]"/>
</xsl:stylesheet>

Output XML

<root-element>
  <tag-name secondline="some-value">bla-bla</tag-name>
  <tag-name thirdline="some-value">bla-bla</tag-name>
  <tag-name secondline="some-value">bla-bla</tag-name>
</root-element>
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • I have to use Python. The attribute does have :, like for example xml:lang –  Dec 15 '21 at 22:51
  • Keep using Python, just use it's library to handle XSLT transformation. https://stackoverflow.com/questions/16698935/how-to-transform-an-xml-file-using-xslt-in-python – Yitzhak Khabinsky Dec 15 '21 at 23:02