3

I'm trying to create an XML file that uses a Pandas dataframe to populate the elements and subelements. Here is the code I have written:

import pandas as pd
from lxml import etree as et

df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate': 
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'], 
'PrimaryServiceCategory': [25, 25, 33, 25]})

root = et.Element('ClientReport')
idnum = et.SubElement(root, 'ID')
prime_serv = et.SubElement(root, 'ServiceCategory')
serv_date = et.SubElement(root, 'ServiceDate')

for row in df.iterrows():
    idnum.text = df['id_profile']
    prime_serv.text = df['PrimaryServiceCategory']
    serv_date.text = df['ServiceDate']

print(et.tostring(root, pretty_print=True))

My expected outcome is:

<ClientReport>
   <ID>439</ID>
   <ServiceCategory>25</ServiceCategory>
   <ServiceDate>2017-12-05</ServiceDate>
</ClientReport>
<ClientReport>
   <ID>444654</ID>
   <ServiceCategory>25</ServiceCategory>
   <ServiceDate>2017-01-25</ServiceDate>
</ClientReport>
<ClientReport>
   <ID>12222</ID>
   <ServiceCategory>25</ServiceCategory>
   <ServiceDate>2017-01-25</ServiceDate>
</ClientReport>

Instead, I get:

TypeError: Argument must be bytes or unicode, got 'Series'

I'm not sure how to get iterated rows, rather than a static value, to populate the XML file's attributes. Is Pandas the right package here? And is LXML the right one as well?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
EduardoX
  • 81
  • 2
  • 9
  • Does this answer your question? [How do convert a pandas/dataframe to XML?](https://stackoverflow.com/questions/18574108/how-do-convert-a-pandas-dataframe-to-xml) – iacob Mar 25 '21 at 09:27

1 Answers1

2

One solution is to iterate over the dataframe, populating each XML element individually:

import pandas as pd
from lxml import etree as et

df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})

root = et.Element('root')

for row in df.iterrows():
    report = et.SubElement(root, 'ClientReport')
    idnum = et.SubElement(report, 'ID')
    prime_serv = et.SubElement(report, 'ServiceCategory')
    serv_date = et.SubElement(report, 'ServiceDate')
    idnum.text = str(row[1]['id_profile'])
    prime_serv.text = str(row[1]['PrimaryServiceCategory'])
    serv_date.text = str(row[1]['ServiceDate'])

print(et.tostring(root, pretty_print=True).decode('utf-8'))
Robᵩ
  • 163,533
  • 20
  • 239
  • 308