30

Is there a simple way to take a pandas/df table:

field_1 field_2 field_3 field_4
cat     15,263  2.52    00:03:00
dog     1,652   3.71    00:03:47
test     312    3.27    00:03:41
book     300    3.46    00:02:40

And convert it to XML along the lines of:

<item>
  <field name="field_1">cat</field>
  <field name="field_2">15263</field>
  <field name="filed_3">2.52</field>

...

<item>
      <field name="field_1">dog</field>

etc.

iacob
  • 20,084
  • 6
  • 92
  • 119
user7289
  • 32,560
  • 28
  • 71
  • 88

4 Answers4

36

You can create a function that creates the item node from a row in your DataFrame:

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

And then apply the function along the axis=1.

>>> print '\n'.join(df.apply(func, axis=1))
<item>
  <field name="field_1">cat</field>
  <field name="field_2">15,263</field>
  <field name="field_3">2.52</field>
  <field name="field_4">00:03:00</field>
</item>
<item>
  <field name="field_1">dog</field>
  <field name="field_2">1,652</field>
  <field name="field_3">3.71</field>
  <field name="field_4">00:03:47</field>
</item>
...
Viktor Kerkez
  • 45,070
  • 12
  • 104
  • 85
  • This is a very neat solution. – Andy Hayden Sep 02 '13 at 15:35
  • 3
    Problem with this is that if you have characters in the InnerText that are special characters, it wont escape them. There are a multitude of XML standards to follow here. If you have very well known data, no punctuation, then this will work. Otherwise if you data has some text like 'shoe lace count > 2', this will throw off any XML reader. A good XML library will encode that to 'shoe lace count > 2'. – Vinnie Amir Aug 07 '18 at 00:05
  • I am trying to_xml() on a pandas dataframe with 1 million rows and it doesn't work... Is there an optimised version for this case? – Francesco Pegoraro Jun 27 '19 at 10:14
  • I am getting AttributeError: 'function' object has no attribute 'apply'. – earl Oct 15 '19 at 06:55
  • What if you had embedded objects? Element in element in element? – Timbus Calin Jan 31 '20 at 09:20
  • @ArvinAmir: ran into exactly this issue and it was a tough find (and obvious in hindsight..grr.) Using https://wiki.python.org/moin/EscapingXml to escape all special elements in the dataframe column before passing it to an xml converter. – Mitalee Rao Sep 15 '20 at 01:23
25

To expand on Viktor's excellent answer (and tweaking it slightly to work with duplicate columns), you could set this up as a to_xml DataFrame method:

def to_xml(df, filename=None, mode='w'):
    def row_to_xml(row):
        xml = ['<item>']
        for i, col_name in enumerate(row.index):
            xml.append('  <field name="{0}">{1}</field>'.format(col_name, row.iloc[i]))
        xml.append('</item>')
        return '\n'.join(xml)
    res = '\n'.join(df.apply(row_to_xml, axis=1))

    if filename is None:
        return res
    with open(filename, mode) as f:
        f.write(res)

pd.DataFrame.to_xml = to_xml

Then you can print the xml:

In [21]: print df.to_xml()
<item>
  <field name="field_1">cat</field>
  <field name="field_2">15,263</field>
  <field name="field_3">2.52</field>
  <field name="field_4">00:03:00</field>
</item>
<item>
...

or save it to a file:

In [22]: df.to_xml('foo.xml')

Obviously this example should be tweaked to fit your xml standard.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
17

You can use the xml.etree.ElementTree package to generate a read-friendly format in a very few lines of code.

root = etree.Element('data');

for i,row in dframe.iterrows():
    item = etree.SubElement(root, 'item', attrib=row.to_dict());

etree.dump(root);

This will create a XML Tree (under root), where each row will will be of type item, and have attributes for all columns. You can create a more nested tree with columns as well by creating a subelement for each field.

Then you can also read the xml file back in Python using the ElementTree package:

xml.etree.ElementTree.parse('xml_file.xml');
sparkonhdfs
  • 1,313
  • 2
  • 17
  • 31
9

As of v1.3, you can simply use:

df.to_xml()
iacob
  • 20,084
  • 6
  • 92
  • 119