0

I am trying rewrite my code written in vba to python to generate the xml is below format shown in the image below using python.

Sample Data

ORDER_RELEASE_GID       PTA
XXXXXXXXXXX.254687058   15/11/2019
XXXXXXXXXXXXX.8000337937    10/10/2019
XXXXXXXXXXXXX.4501222542    27/9/2019
XXXXXXXXXXXXX.4501221219    1/9/2019
XXXXXXXXXXXXX.4501220248    8/9/2019
XXXXXXXXXXXXX.8000337932    
XXXXXXXXXXXXX.8000338393    22/10/2019
XXXXXXXXXXXXX.8000338445    10/9/2019
XXXXXXXXXXXXX.4501221982    9/9/2019
XXXXXXXXXXXXX.256535442 9/9/2019
XXXXXXXXXXXXX.4501220239    1/9/2019
XXXXXXXXXXXXX.4501221187    18/10/2019
XXXXXXXXXXXXX.4501220259    18/10/2019

My VBA Code

Public Const ORDER_ID = "A"
Public Const PTA = "B"
Sub GenerateXML()
Dim i As Integer
Dim row As Long
Dim cur_order As String   
Dim xmlFile As String
Dim path As String
Dim oWSS As Object
Dim glogdate As String
Dim StatusDate As Date

Set oWSS = CreateObject("WScript.Shell")
glogdate = Format(Now(), "yyyyMMddhhmmss")    
xmlFile = ActiveWorkbook.path & "\xxxxxXXXXX_" & xxxxxxxx& ".xml"
With Worksheets("Sheet1")
    row = .Cells(.Rows.Count, "A").End(xlUp).row
End With


i = 2

Open xmlFile For Output As #1

'TRANSMISSION HEADER
Print #1, "<?xml version=" & Chr(34) & "1.0" & Chr(34) & " encoding=" & Chr(34) & "ISO-8859-1" & Chr(34) & "?>"
Print #1, "<Transmission>"
Print #1, " <TransmissionHeader>"
Print #1, "     <Version>6.4</Version>"
Print #1, "     <TransmissionCreateDt>"
Print #1, "         <GLogDate>" & glogdate & "</GLogDate>"
Print #1, "     </TransmissionCreateDt>"
Print #1, "     <UserName>xxxxxxxxxxxx</UserName>"
Print #1, "     <Password>xxxxx</Password>"
Print #1, "     <SenderTransmissionNo>xxxxxxxxxxxx_" & glogdate & "</SenderTransmissionNo>"
'Print #1, "     <TransmissionType>STAGING</TransmissionType>"
Print #1, "     <AckSpec>"
Print #1, "         <ComMethodGid>"
Print #1, "             <Gid>"
Print #1, "             <Xid>EMAIL</Xid>"
Print #1, "             </Gid>"
Print #1, "         </ComMethodGid>"
Print #1, "         <EmailAddress>xxxx.com</EmailAddress>"
Print #1, "         <AckOption>ERROR</AckOption>"
Print #1, "      </AckSpec>"
Print #1, " </TransmissionHeader>"
Print #1, "<TransmissionBody>"

Do While i <= row


If (Worksheets("Sheet1").Range(PTA & i).Value) = "" Then
 i = i + 1
Else

'STATUS
Print #1, " <GLogXMLElement>    "
Print #1, "     <GenericStatusUpdate>   "
Print #1, "         <GenericStatusObjectType>ORDER_RELEASE</GenericStatusObjectType>   "
Print #1, "             <Gid>   "
Print #1, "                 <DomainName>xxx</DomainName> "
Print #1, "                 <Xid>" & Mid(Worksheets("Sheet1").Range(ORDER_ID & i).Value, 12) & "</Xid>   "
Print #1, "             </Gid>  "
Print #1, "         <TransactionCode>U</TransactionCode>   "
Print #1, "         <Refnum>  "
Print #1, "             <RefnumQualifierGid>   "
Print #1, "             <Gid> "
Print #1, "                 <DomainName>XXXXXXXXXXXXX</DomainName> "
Print #1, "                 <Xid>XXXXXX</Xid> "
Print #1, "             </Gid>  "
Print #1, "             </RefnumQualifierGid>   "
Print #1, "             <RefnumValue>" & Format$(Worksheets("Sheet1").Range(PTA & i).Value, "dd-mm-yyyy") & "</RefnumValue>   "
Print #1, "         </Refnum>  "
Print #1, "     </GenericStatusUpdate>   "
Print #1, " </GLogXMLElement>    "

i = i + 1

End If

Loop

Print #1, "</TransmissionBody>"
Print #1, "</Transmission>"
Close #1

End Sub

I am tried writing Python code but not able to generate custom xml like vba. I tried using dicttoxl and lxml function genreate the xml formats but the not able create custom xml.

How can this be done?

My python code

df.to_csv('GGG.csv',index=False)
import csv
import lxml.etree as ET

headers = ['ORDER_RELEASE_GID','PTA']

# INITIALIZING XML FILE
root = ET.Element('root')

# READING CSV FILE AND BUILD TREE
with open('GSK.csv') as f:
    next(f)                             # SKIP HEADER
    csvreader = csv.reader(f)

    for row in csvreader:        
        data = ET.SubElement(root, "data")
        for col in range(len(headers)):
            node = ET.SubElement(data, headers[col]).text = str(row[col])

# SAVE XML TO FILE
tree_out = (ET.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8"))

# OUTPUTTING XML CONTENT TO FILE
with open('Output.xml', 'wb') as f:
    f.write(tree_out)
pankaj
  • 420
  • 1
  • 8
  • 26
  • Moderators can redact sensitive information from a post's edit history. I have submitted a redaction request for this answer, removing the email address, user name, password, and domain name. In the future, if you need to redact some accidentally-exposed information, [edit out *just* the sensitive information and then flag the post for moderator attention, asking us to redact the information as you've just done in an edit](https://meta.stackoverflow.com/q/258066). Do *not* attempt to obliterate the entire post. That's seen as vandalism, and it is going to attract negative attention. – Cody Gray - on strike Oct 15 '19 at 18:53

1 Answers1

0

This is a straightforward translation of your VBA code: I took the header, body and footer output from the VBA and inserted the formatted values at the necessary places. Empty date rows are dropped before processing the dataframe.

header = """<?xml version="1.0" encoding="ISO-8859-1"?>
<Transmission>
 <TransmissionHeader>
     <Version>6.4</Version>
     <TransmissionCreateDt>
         <GLogDate>{0}</GLogDate>
     </TransmissionCreateDt>
     <UserName>user</UserName>
     <Password>password</Password>
     <SenderTransmissionNo>GSK_PTA_UPLOAD_{0}</SenderTransmissionNo>
     <TransmissionType>STAGING</TransmissionType>
     <AckSpec>
         <ComMethodGid>
             <Gid>
             <Xid>EMAIL</Xid>
             </Gid>
         </ComMethodGid>
         <EmailAddress>user@example.com</EmailAddress>
         <AckOption>ERROR</AckOption>
      </AckSpec>
 </TransmissionHeader>
<TransmissionBody>
"""

body = """ <GLogXMLElement>    
     <GenericStatusUpdate>   
         <GenericStatusObjectType>ORDER_RELEASE</GenericStatusObjectType>   
             <Gid>   
                 <DomainName>example.com</DomainName> 
                 <Xid>{}</Xid>   
             </Gid>  
         <TransactionCode>U</TransactionCode>   
         <Refnum>  
             <RefnumQualifierGid>   
             <Gid> 
                 <DomainName>example.com</DomainName> 
                 <Xid>PTA</Xid> 
             </Gid>  
             </RefnumQualifierGid>   
             <RefnumValue>{}</RefnumValue>   
         </Refnum>  
     </GenericStatusUpdate>   
 </GLogXMLElement>
"""

footer = """</TransmissionBody>
</Transmission>
"""
import pandas as pd
import datetime

//header body footer defined above     

df = pd.read_excel('gsk_pta_20190828.xlsm')
df = df.dropna()
glogdate = datetime.datetime.now().strftime('%Y%m%d%H%M%S')

with open('gsk_pta_20190828.xlm', 'w') as f:
    f.write(header.format(glogdate))
    for row in df.itertuples():
       f.write(body.format(row[1][11:], row[2].strftime('%d-%m-%Y')))
    f.write(footer)

If itertuples with writing each individual line is too slow for large files, you might try to first create a list of the bodies and then write it in one function call as demonstrated here.

Ömer Erden
  • 7,680
  • 5
  • 36
  • 45
Stef
  • 28,728
  • 2
  • 24
  • 52
  • This means that the second column is of string instead of datatime type. This is strange because when I imported your Excel file it correctly converted to datetime. In your case just add `df.PTA = pd.to_datetime(df.PTA)` after the `read_excel` line – Stef Sep 09 '19 at 12:05
  • This means that your first column is obviously of int type instead of str. My code works perfectly without any modifications for the sample data you provided. Can you upload the Excel file that causes the problems? – Stef Sep 09 '19 at 12:19
  • Yes, it does: read_excel gets the correct types from the xlsm file while for csv the types are inferred during import unless you specify the types. If you upload a sample csv I can adapt the code. – Stef Sep 09 '19 at 12:40
  • for the sample csv data use `df = pd.read_csv('GK.csv', usecols=[1,2], parse_dates=[1])` or alternatively `df = pd.read_csv('GK.csv', index_col=0, parse_dates=[2])` – Stef Sep 09 '19 at 13:26
  • can you paste the output you recieved as well – pankaj Sep 09 '19 at 13:27
  • I've seen you unaccepted and downvoted the answer. What's wrong with it suddenly. Do you need further assistance? @Cody-gray removed some pieces of information from the code you supplied and I used in my answer. Was this your only concern? – Stef Oct 27 '19 at 09:49