0

Howdie do,

So I have the following two XML files.

File A:

<?xml version="1.0" encoding="UTF-8"?>
<GetShipmentUpdatesResult>
    <Shipments>
        <Shipment>
            <Container>
                <OrderNumber>5108046</OrderNumber>
                <ContainerNumber>5108046_1</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-12T12:00:00</ShipDate>
                <CarrierName>UPS</CarrierName>
                <TrackingNumber>1ZX20520A803682850</TrackingNumber>
                <StatusCode>InTransit</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T13:53:18</TimeStamp>
                        <City></City>
                        <StateOrProvince></StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T18:47:44</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>Status: AF Recorded</Description>
                        <TrackingStatus>In Transit</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
        <Shipment>
            <Container>
                <OrderNumber>456789</OrderNumber>
                <ContainerNumber>44789</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-03T13:56:27</ShipDate>
                <CarrierName>UP2</CarrierName>
                <TrackingNumber>1Z4561230020</TrackingNumber>
                <StatusCode>IN_TRANSIT</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-07-03T13:56:27</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
    </Shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId></RequestId>
    <RecordsRemaining>0</RecordsRemaining>
</GetShipmentUpdatesResult>

File B:

<?xml version="1.0" encoding="UTF-8"?>
<getShipmentStatusResponse>
    <getShipmentStatusResult>
        <outcome>
            <result>Success</result>
            <error></error>
        </outcome>
        <shipments>
            <shipment>
                <orderID>123456</orderID>
                <containerNo>CD1863663C</containerNo>
                <shipDate>2015-06-29T18:47:44</shipDate>
                <carrier>UPS</carrier>
                <trackingNumber>1Z4561230001</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T13:53:18</timeStamp>
                        <city />
                        <state />
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T18:47:44</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Shipped from warehouse</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
            <shipment>
                <orderID>456789</orderID>
                <containerNo>44789</containerNo>
                <shipDate>2015-07-03T13:56:27</shipDate>
                <carrier>UP2</carrier>
                <trackingNumber>1Z4561230020</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-07-03T13:56:27</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
        </shipments>
        <matchingRecords>2</matchingRecords>
        <requestId></requestId>
        <remainingRecords>0</remainingRecords>
    </getShipmentStatusResult>
</getShipmentStatusResponse>

I basically need to read through File A and change it to look like File B. Now, I've been using xmltodic to parse the File A, but it only will read the top element. It seems I would have to create multiple for loops in order to achieve this with xmltodict. A loop to go through each parent and then childern elements.

Looking at elementree, this appears to be the same. Does anyone know any other way to do this without having to do multiple for loops?

Jimmy
  • 887
  • 1
  • 10
  • 24

1 Answers1

2

Since your output is more or less an exact mapping of the input - only the element names seem to differ, I suggest you use XSLT to do the transformation declaratively.

Assuming that each input element name maps unconditionally to exactly one output element name (that's what it looks like, judging by your sample): Here's an XSLT 1.0 transformation to get you started (a basic instruction how to use XSLT in Python can be found in this answer):

<xsl:transform version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:my="http://tempuri.org/config"
  exclude-result-prefixes="my"
>
  <xsl:output method="xml" encoding="UTF-8" indent="yes" />
  <xsl:strip-space elements="*" />

  <my:config>
    <nameMap from="Shipments" to="shipments" />
    <nameMap from="Shipment" to="shipment" />
    <nameMap from="Container" to="-" />
  </my:config>
  <xsl:variable name="nameMap" select="document('')/*/my:config/nameMap" />

  <xsl:template match="node() | @*" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <getShipmentStatusResponse>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResponse>
  </xsl:template>

  <xsl:template match="GetShipmentUpdatesResult">
    <getShipmentStatusResult>
      <outcome>
        <result>Success</result>
        <error></error>
      </outcome>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResult>
  </xsl:template>

  <xsl:template match="*">
    <xsl:variable name="map" select="$nameMap[@from = name(current())]" />
    <xsl:choose>
      <xsl:when test="$map/@to = '-'">
        <xsl:apply-templates select="@* | node()" />
      </xsl:when>
      <xsl:when test="$map/@to != ''">
        <xsl:element name="{$map/@to}">
          <xsl:apply-templates select="@* | node()" />
        </xsl:element>
      </xsl:when>
      <xsl:when test="$map/@to = ''" />
      <xsl:otherwise>
        <xsl:call-template name="identity" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:transform>

The transformation approaches the problem as follows:

  • At its core, the identity transform is at work: Any node that does not match a specialized template will be copied to the output as-is.
  • It contains an in-place config section (<my:config>) where you can place <nameMap> elements for mapping input names to output names. This works through the following convention (implemented in the <xsl:template match="*"> a few lines down):

    • if an input element matches any @from and the @to is filled in, the element will renamed and its children will be processed
    • if an input element matches any @from and the @to is '-', the element will be removed but its children will still be processed.
    • if an input element matches any @from and the @to is empty, it will be removed from the output completely
    • in all other cases the input element will be copied 1:1, via the identity template.

Currently the output looks like this. Add more <nameMap> rules to define the behavior for the rest of the input elements.

<getShipmentStatusResponse>
  <getShipmentStatusResult>
    <outcome>
      <result>Success</result>
      <error />
    </outcome>
    <shipments>
      <shipment>
        <OrderNumber>5108046</OrderNumber>
        <ContainerNumber>5108046_1</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-12T12:00:00</ShipDate>
        <CarrierName>UPS</CarrierName>
        <TrackingNumber>1ZX20520A803682850</TrackingNumber>
        <StatusCode>InTransit</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-06-29T13:53:18</TimeStamp>
            <City />
            <StateOrProvince />
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
          <TrackingEvent>
            <TimeStamp>2015-06-29T18:47:44</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>Status: AF Recorded</Description>
            <TrackingStatus>In Transit</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
      <shipment>
        <OrderNumber>456789</OrderNumber>
        <ContainerNumber>44789</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-03T13:56:27</ShipDate>
        <CarrierName>UP2</CarrierName>
        <TrackingNumber>1Z4561230020</TrackingNumber>
        <StatusCode>IN_TRANSIT</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-07-03T13:56:27</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
    </shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId />
    <RecordsRemaining>0</RecordsRemaining>
  </getShipmentStatusResult>
</getShipmentStatusResponse>
Community
  • 1
  • 1
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Wow I've never heard of XSLT before, but based off your initial results, this seems like the item that I need to use. Thank you SOOOOO much – Jimmy Aug 07 '15 at 02:02
  • Hey, I think I pretty much have XSLT and XPath understood, but the one item I can't figure out in your code is: select="document('')/*/my:config/nameMap" I just don't get where you get "document(")" Is that a predefined command? – Jimmy Aug 07 '15 at 03:24
  • Yes, it's the self-reference. Since XSLT programs are themselvs XML documents, as well, they can be treated as such by the XSLT engine. `document('')` accesses the currency running XSLT program. – Tomalak Aug 07 '15 at 05:46
  • Thank you for the clarification. The only thing that I'm confused about now, is how your is actually removing the tiems. For example, when the test="$map/@to = ''", how does that know to remove that item? Thank you again for your help – Jimmy Aug 07 '15 at 16:39