0

I have the following XML/KML file (see below just a part of the entire data).

I want to remove specific elements and their contents via XSLT (I'm using Notepad++ with the Plugin XML Tools). The file is very big, and it's mandatory to use XSLT.

I want to remove <Snippet> elements and specific tag/contents from the <description> elements: <p> tags.

For example, one raw entry is like this:

<Placemark><name>Wando</name><Snippet>Record 325</Snippet><description><![CDATA[<p>Data source: <a href="https://mrdata.usgs.gov/ofr-2005-1294/" title="">Major mineral deposits worldwide</a></p><p>[<a href="https://mrdata.usgs.gov/major-deposits/show-ofr20051294.php?gid=325">All data for record 325</a>]</p><table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Hydrothermal</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>Korean Peninsula</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td></td></tr></table>]]></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>126.6833,34.35,0</coordinates></Point></Placemark>

After XSLT I want to achieve:

<Placemark><name>Wando</name><description><![CDATA[<table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Hydrothermal</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>Korean Peninsula</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td></td></tr></table>]]></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>126.6833,34.35,0</coordinates></Point></Placemark>

P.S. It's is possible to remove also the <![CDATA[ + without <table> + ]]>

I really need <table>, for example:

<Placemark><name>Wando</name><description><table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Hydrothermal</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>Korean Peninsula</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td></td></tr></table></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>126.6833,34.35,0</coordinates></Point></Placemark>

The entire RAW data:

<?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://earth.google.com/kml/2.2/">   <Document>
    <name>Major mineral deposits of the world</name>
    <description>Regional locations and general geologic setting of known deposits of major nonfuel mineral commodities. Originally compiled in five parts by diverse authors, combined here for convenience despite likely inconsistencies among the regional reports.</description>
    <Placemark><name>Wando</name><Snippet>Record 325</Snippet><description><![CDATA[<p>Data source: <a href="https://mrdata.usgs.gov/ofr-2005-1294/" title="">Major mineral deposits worldwide</a></p><p>[<a href="https://mrdata.usgs.gov/major-deposits/show-ofr20051294.php?gid=325">All data for record 325</a>]</p><table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Hydrothermal</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>Korean Peninsula</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td></td></tr></table>]]></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>126.6833,34.35,0</coordinates></Point></Placemark>
    <Placemark><name>McDonald</name><Snippet>Record 549</Snippet><description><![CDATA[<p>Data source: <a href="https://mrdata.usgs.gov/ofr-2005-1294/" title="">Major mineral deposits worldwide</a></p><p>[<a href="https://mrdata.usgs.gov/major-deposits/show-ofr20051294.php?gid=549">All data for record 549</a>]</p><table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Hydrothermal</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>United States</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td>Montana</td></tr></table>]]></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>-112.525,47,0</coordinates></Point></Placemark>
    <Placemark><name>Montana Mountains</name><Snippet>Record 575</Snippet><description><![CDATA[<p>Data source: <a href="https://mrdata.usgs.gov/ofr-2005-1294/" title="">Major mineral deposits worldwide</a></p><p>[<a href="https://mrdata.usgs.gov/major-deposits/show-ofr20051294.php?gid=575">All data for record 575</a>]</p><table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Hydrothermal</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>United States</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td>Nevada</td></tr></table>]]></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>-118.108,41.767,0</coordinates></Point></Placemark>
    <Placemark><name>Basay</name><Snippet>Record 429</Snippet><description><![CDATA[<p>Data source: <a href="https://mrdata.usgs.gov/ofr-2005-1294/" title="">Major mineral deposits worldwide</a></p><p>[<a href="https://mrdata.usgs.gov/major-deposits/show-ofr20051294.php?gid=429">All data for record 429</a>]</p><table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Hydrothermal</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>Philippines</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td></td></tr></table>]]></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>122.6333,9.5667,0</coordinates></Point></Placemark>
    <Placemark><name>Georgina Basin</name><Snippet>Record 52</Snippet><description><![CDATA[<p>Data source: <a href="https://mrdata.usgs.gov/ofr-2005-1294/" title="">Major mineral deposits worldwide</a></p><p>[<a href="https://mrdata.usgs.gov/major-deposits/show-ofr20051294.php?gid=52">All data for record 52</a>]</p><table border='1' padding='3' cellspacing='0'><tr valign='top'><th align='right' bgcolor='#ddffee' title='Generalized type of deposit'>Deposit type</th><td>Sedimentary</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='Country in which the site is located'>Country</th><td>Australia</td></tr><tr valign='top'><th align='right' bgcolor='#ddffee' title='State in which the site is located, for US sites'>State</th><td></td></tr></table>]]></description><styleUrl>#defaultStyleMap</styleUrl><Point><altitudeMode>relativeToGround</altitudeMode><coordinates>139.9667,-21.8833,0</coordinates></Point></Placemark>
    <Style id="default_highlight"><BalloonStyle><text>Major Mineral Deposits</text></BalloonStyle><IconStyle><scale>1.5</scale><Icon><href>https://mrdata.usgs.gov/images/mine-32.png</href></Icon></IconStyle><LabelStyle><color>ffffffff</color></LabelStyle></Style><Style id="default_normal"><IconStyle><scale>1</scale><Icon><href>https://mrdata.usgs.gov/images/mine-32.png</href></Icon></IconStyle><LabelStyle><color>00ffffff</color></LabelStyle></Style><StyleMap id="defaultStyleMap"><Pair><key>normal</key><styleUrl>#default_normal</styleUrl></Pair><Pair><key>highlight</key><styleUrl>#default_highlight</styleUrl></Pair></StyleMap> </Document> </kml>

2 Answers2

1

Removing the Snippet element is trivial: use the identity transform template and add an empty template matching Snippet.

Converting the raw text data within CDATA sections into a markup is not: try using disable-output-escaping when writing the output to a file, then use another stylesheet to process the resulting file. Or move up to a processor that supports XSLT 3.0 (or has an extension function to enable serializing of escaped markup).


Demo: https://xsltfiddle.liberty-development.net/6r5Gh39


Another option you may consider is to "hack" the escaped markup by cutting off the substring preceding the table part using a simple string manipulation:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Snippet"/>

<xsl:template match="description">
    <xsl:copy>
        <xsl:variable name="len" select="string-length(substring-before(., '&lt;table'))" />
        <xsl:value-of select="substring(., $len + 1)" disable-output-escaping="yes"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Demo: https://xsltfiddle.liberty-development.net/6r5Gh39/1

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Dear @michael.hor257k When I put multiple lines I got an error, with one row everything works. https://xsltfiddle.liberty-development.net/6r5Gh39/4 – Apopei Andrei Ionut Mar 23 '19 at 16:21
  • As the error message says, your input is not well-formed XML. An XML document MUST have a single root element. Note also that if your input is a KML document, you must account for the KML *namespace* - see, for example: https://stackoverflow.com/questions/34758492/xslt-transform-doesnt-work-until-i-remove-root-node/34762628#34762628 – michael.hor257k Mar 23 '19 at 16:26
1
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="Placemark">
        <xsl:element name="Placemark">
            <xsl:copy-of select="name"/>
            <xsl:element name="description">
                <xsl:variable name="finallenght" select="string-length(substring-before(description, '&lt;table'))" />
                <xsl:value-of select="substring(description, $finallenght + 1)" disable-output-escaping="yes"/>   
            </xsl:element>
            <xsl:copy-of select="styleUrl"/>
            <xsl:copy-of select="Point"/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

You can use this also