0

I am using lxml to read my xml file with:

tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\SalesTransactionCustom.xml')

And get an xml file like:

<?xml version="1.0" encoding="UTF-8"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
  <ApplicationArea>
    <Sender>
      <LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
      <ComponentID>External</ComponentID>
      <ConfirmationCode>OnError</ConfirmationCode>
    </Sender>
    <CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
    <BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&amp;verb=Process</BODID>
  </ApplicationArea>
  <DataArea>
    <Process>
      <TenantID>xxx</TenantID>
      <AccountingEntityID>4710</AccountingEntityID>
      <LocationID>S_4710</LocationID>
      <ActionCriteria>
        <ActionExpression actionCode="Add"/>
      </ActionCriteria>
    </Process>
    <SalesTransactionCustom>
      <FinancialBatch>
        <TransactionDate>2019-09-27T00:00:00</TransactionDate>
        <BatchReference>KUKS_20190928052427</BatchReference>
      </FinancialBatch>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001160</Invoice>
          <BusinessPartner>417B00</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>152248.80</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>DK</TaxCountry>
          <TaxCode>BESIT</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>417B00 PC210LCI-11</TransactionReference>
            <LedgerAccount>50000400</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>152248.80</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001161</Invoice>
          <BusinessPartner>412600</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>113848.17</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>AT</TaxCountry>
          <TaxCode>GBSI</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>412600 PC210NLC-11</TransactionReference>
            <LedgerAccount>50000400</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>113848.17</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
    </SalesTransactionCustom>
  </DataArea>
</ProcessSalesTransactionCustom>

I have a pandas dataframe like (here the first row are column names):

Tag             Old Value   New Value
BusinessPartner 417B00      BPE000104
BusinessPartner 412600      BPE000153
LedgerAccount   50000400    108092200

I want to replace the attributes of the elements in the xml with reference to this pandas dataframe. I want to be able to find the combination of tag and old value and replace the attribute with the new value. I also need to be able to write the edited text back to disk as an XML.

How can I do this with lxml and pandas?

Thank you in advance

Edit: Here is the code that works thanks to @Partha Mandal

import pandas as pd
from lxml import etree

df=pd.read_excel("Sample.xlsx")
df.columns=['Tag','Old','New']
df['Old'] = df['Old'].astype(str)
df['New'] = df['New'].astype(str)

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\testxml2.xml',parser)
string = etree.tostring(tree)
string = bytes.decode(string)

tag = df.Tag; old = df.Old; new = df.New

for i in range(len(tag)):
   string = string.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")

string=str.encode(string)

root = etree.fromstring(string)
my_tree = etree.ElementTree(root)
with open('testxml2.xml', 'wb') as f:
    f.write(etree.tostring(my_tree))
Thelonious Monk
  • 426
  • 5
  • 19

2 Answers2

1

Why not just read the XML as a string and do str.replace?

tag = df.Tag; old = df.Old; new = df.New

for i in range(len(tag)):
   _str = _str.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")
Partha Mandal
  • 1,391
  • 8
  • 14
  • I added this code to read it as a string - 'string = etree.tostring(tree)'. Then I ran the for loop in your comment but without the underscore (as I am not sure what that is for and don't have such a variable defined). So i run the for loop with 'string' instead of '_str' but get the following error : TypeError: a bytes-like object is required, not 'str' – Thelonious Monk Apr 24 '20 at 18:45
  • 1
    Can you run a `string = bytes.decode(string)` before running the `str.replace`? – Partha Mandal Apr 24 '20 at 19:16
  • Thanks. I did as you said and the string is now edited. However I also need to write this back to disk as an XML. (I have edited the question) I am trying the code - `with open('sample2.xml', 'wb') as f: string.write(f, encoding="utf-8", xml_declaration=True, pretty_print=True)` but get the error 'bytes' object has no attribute 'write'. (I tried again to decode into string but get the same error with 'str' instead of 'bytes') – Thelonious Monk Apr 24 '20 at 19:55
  • 1
    Oh! Shouldn't you use `f.write...`, that might be the issue there? – Partha Mandal Apr 24 '20 at 20:07
0

Because you use lxml, consider XSLT, the special purpose language designed to transform XML files to different XML and supports passing parameters from top layer such as Python. Therefore, integrate parameterization in a loop across data frame records:

The only challenge is to hard-code all unique values of Tag into second template match of XSLT (line breaks after pipe is fine):

doc:BusinessPartner|doc:LedgerAccount

which you can do with

"|".join(['doc:'+ val for val in df['Tag'].unique()])

"|\n".join(['doc:'+ val for val in df['Tag'].unique()])

XSLT (save as .xsl, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:doc="http://schema.xxxx.com/xxxxx/2">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- PARAMETERS -->
  <xsl:param name="tag" />
  <xsl:param name="old_value" />
  <xsl:param name="new_value" />

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- CONDITIONALL ASSIGN PARAMS --> 
  <xsl:template match="doc:BusinessPartner|doc:LedgerAccount">
    <xsl:choose>
        <xsl:when test = "text() = $old_value">
            <xsl:copy>
                <xsl:value-of select="$new_value"/>
            </xsl:copy>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy>
                <xsl:value-of select="text()"/>
            </xsl:copy>
        </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Python

import pandas as pd
import lxml.etree as et

df = pd.read_csv(...)

# LOAD XML AND XSL SCRIPT
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')
transform = et.XSLT(xsl)

# PASS PARAMETER TO XSLT
df_list = df.to_dict('records')

for v in df_list:   
    result = transform(xml, tag = et.XSLT.strparam(v['Tag']), 
                            old_value = et.XSLT.strparam(v['Old Value']), 
                            new_value = et.XSLT.strparam(v['New Value']))

    xml = result

# SAVE TO NEW XML
with open("Output.xml", 'wb') as f:
    f.write(result)

XML Output

<?xml version="1.0"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
  <ApplicationArea>
    <Sender>
      <LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
      <ComponentID>External</ComponentID>
      <ConfirmationCode>OnError</ConfirmationCode>
    </Sender>
    <CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
    <BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&amp;verb=Process</BODID>
  </ApplicationArea>
  <DataArea>
    <Process>
      <TenantID>infor</TenantID>
      <AccountingEntityID>4710</AccountingEntityID>
      <LocationID>S_4710</LocationID>
      <ActionCriteria>
        <ActionExpression actionCode="Add"/>
      </ActionCriteria>
    </Process>
    <SalesTransactionCustom>
      <FinancialBatch>
        <TransactionDate>2019-09-27T00:00:00</TransactionDate>
        <BatchReference>KUKS_20190928052427</BatchReference>
      </FinancialBatch>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001160</Invoice>
          <BusinessPartner>BPE000104</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>152248.80</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>DK</TaxCountry>
          <TaxCode>BESIT</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>417B00 PC210LCI-11</TransactionReference>
            <LedgerAccount>108092200</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>152248.80</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001161</Invoice>
          <BusinessPartner>BPE000153</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>113848.17</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>AT</TaxCountry>
          <TaxCode>GBSI</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>412600 PC210NLC-11</TransactionReference>
            <LedgerAccount>108092200</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>113848.17</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
    </SalesTransactionCustom>
  </DataArea>
</ProcessSalesTransactionCustom>
Parfait
  • 104,375
  • 17
  • 94
  • 125