-1

I have a CSV file with a list of numbers with gaps in them, like:

0001
0002
0003
0005
0007
etc.

And I have an XML file with nodes with identifiers with a list of numbers without gaps, like:

<?xml version='1.0' encoding='utf-8'?>
<root>
    <item>
        <unitd>0001</unitd>
        <unittitle>description of item 1</unittitle>
    </item>
    <item>
        <unitd>0002</unitd>
        <unittitle>description of item 2</unittitle>
    </item>
    <item>
        <unitd>0003</unitd>
        <unittitle>description of item 3</unittitle>
    </item>
    <item>
        <unitd>0004</unitd>
        <unittitle>description of item 4</unittitle>
    </item>
    <item>
        <unitd>0005</unitd>
        <unittitle>description of item 5</unittitle>
    </item>
    <item>
        <unitd>0006</unitd>
        <unittitle>description of item 6</unittitle>
    </item>
    <item>
        <unitd>0007</unitd>
        <unittitle>description of item 7</unittitle>
    </item>
</root>      <!-- added by edit -->

I want to add an extra element to the items of the XML file that have identifiers that can be found in the CSV file, like this:

<root>
<item>
    <unitd>0001</unitd>
    <unittitle>description of item 1</unittitle>
    <link>link to extra info on item 1</link>
</item>
<item>
    <unitd>0002</unitd>
    <unittitle>description of item 2</unittitle>
    <link>link to extra info on item 2</link>
</item>
<item>
    <unitd>0003</unitd>
    <unittitle>description of item 3</unittitle>
    <link>link to extra info on item 3</link>
</item>
<item>
    <unitd>0004</unitd>
    <unittitle>description of item 4</unittitle>
</item>
<item>
    <unitd>0005</unitd>
    <unittitle>description of item 5</unittitle>
    <link>link to extra info on item 5</link>
</item>
<item>
    <unitd>0006</unitd>
    <unittitle>description of item 6</unittitle>
</item>
<item>
    <unitd>0007</unitd>
    <unittitle>description of item 7</unittitle>
    <link>link to extra info on item 7</link>
</item>

Can I do this using python and how or is there a smarter way to take care of this?

zx485
  • 28,498
  • 28
  • 50
  • 59
Cannedit
  • 1
  • 2
  • Can *you*? Probably, since you asked this, "no". Can anyone else? Of course. Python is very well suited to this kind of tasks. – Jongware Mar 28 '18 at 20:32

1 Answers1

0

The smartest way to handle an XML to XML transformation is using XSLT which was designed for this exact purpose.

So to transform your source XML to your desired destination XML you can use this XSLT-1.0 script (named trans.xslt):

<xsl:stylesheet version ="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="xml" encoding="UTF-8"/>
    <xsl:variable name="additional" select="document('a_link.xml')/LinkMapping" />   <!-- name of helper XML file -->

    <!-- identity template -->
    <xsl:template match="node()|@*" > 
        <xsl:copy>
            <xsl:apply-templates select="node()|@*" />
        </xsl:copy>
    </xsl:template> 

    <!-- item transform template -->
    <xsl:template match="item" > 
        <xsl:copy>
            <xsl:copy-of select="node()|@*" />
            <xsl:if test="$additional/map[@id=current()/unitd]">
                <link>
                    <xsl:value-of select="$additional/map[@id=current()/unitd]/text()" />
                </link>
            </xsl:if>
        </xsl:copy>
    </xsl:template> 

</xsl:stylesheet>

This template requires an additional XML file containing the mapping of the links from the CSV file called a_link.xml. Your example of a CSV file does not show any relation like this, but transforming the CSV to something like the below format should be no problem.

<LinkMapping>
    <map id="0001">link to extra info on item 1</map>
    <map id="0002">link to extra info on item 2</map>
    <map id="0003">link to extra info on item 3</map>
    <map id="0005">link to extra info on item 5</map>
    <map id="0007">link to extra info on item 7</map>
</LinkMapping>

The output of applying the above XSLT with the XML helper file is as desired.


So to use this with Python, you can refer to this SO answer which explains how to transform an XML file with XSLT.

Assuming that your XML file is named input.xml the code could look like this:

import lxml.etree as ET

dom = ET.parse("input.xml")
xslt = ET.parse("trans.xslt")
transform = ET.XSLT(xslt)
newdom = transform(dom)
print(ET.tostring(newdom, pretty_print=True))

Now you should have gotten your desired result.

zx485
  • 28,498
  • 28
  • 50
  • 59