How do convert a pandas dataframe to XML?

Question

Is there a simple way to take a pandas/df table:

field_1 field_2 field_3 field_4
cat     15,263  2.52    00:03:00
dog     1,652   3.71    00:03:47
test     312    3.27    00:03:41
book     300    3.46    00:02:40

And convert it to XML along the lines of:

<item>
  <field name="field_1">cat</field>
  <field name="field_2">15263</field>
  <field name="filed_3">2.52</field>

...

<item>
      <field name="field_1">dog</field>

etc.

hmmm seems to_xml is a strange omission, there is to_html, to_json etc. Perhaps this would make a good enhancement request. — Andy Hayden, Sep 02 '13 at 14:38
@Jeff Probably not, I think Viktor's recipe can probably be tweaked to a user's specific needs. — Andy Hayden, Sep 02 '13 at 18:03

score 36 · Accepted Answer · answered Sep 02 '13 at 14:56

36

You can create a function that creates the item node from a row in your DataFrame:

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

And then apply the function along the axis=1.

>>> print '\n'.join(df.apply(func, axis=1))
<item>
  <field name="field_1">cat</field>
  <field name="field_2">15,263</field>
  <field name="field_3">2.52</field>
  <field name="field_4">00:03:00</field>
</item>
<item>
  <field name="field_1">dog</field>
  <field name="field_2">1,652</field>
  <field name="field_3">3.71</field>
  <field name="field_4">00:03:47</field>
</item>
...

answered Sep 02 '13 at 14:56

Viktor Kerkez

45,070
12
104
85

This is a very neat solution. – Andy Hayden Sep 02 '13 at 15:35
3

Problem with this is that if you have characters in the InnerText that are special characters, it wont escape them. There are a multitude of XML standards to follow here. If you have very well known data, no punctuation, then this will work. Otherwise if you data has some text like 'shoe lace count > 2', this will throw off any XML reader. A good XML library will encode that to 'shoe lace count > 2'. – Vinnie Amir Aug 07 '18 at 00:05
I am trying to_xml() on a pandas dataframe with 1 million rows and it doesn't work... Is there an optimised version for this case? – Francesco Pegoraro Jun 27 '19 at 10:14
I am getting AttributeError: 'function' object has no attribute 'apply'. – earl Oct 15 '19 at 06:55
What if you had embedded objects? Element in element in element? – Timbus Calin Jan 31 '20 at 09:20
@ArvinAmir: ran into exactly this issue and it was a tough find (and obvious in hindsight..grr.) Using https://wiki.python.org/moin/EscapingXml to escape all special elements in the dataframe column before passing it to an xml converter. – Mitalee Rao Sep 15 '20 at 01:23

score 25 · Answer 2 · answered Sep 02 '13 at 18:36

To expand on Viktor's excellent answer (and tweaking it slightly to work with duplicate columns), you could set this up as a to_xml DataFrame method:

def to_xml(df, filename=None, mode='w'):
    def row_to_xml(row):
        xml = ['<item>']
        for i, col_name in enumerate(row.index):
            xml.append('  <field name="{0}">{1}</field>'.format(col_name, row.iloc[i]))
        xml.append('</item>')
        return '\n'.join(xml)
    res = '\n'.join(df.apply(row_to_xml, axis=1))

    if filename is None:
        return res
    with open(filename, mode) as f:
        f.write(res)

pd.DataFrame.to_xml = to_xml

Then you can print the xml:

In [21]: print df.to_xml()
<item>
  <field name="field_1">cat</field>
  <field name="field_2">15,263</field>
  <field name="field_3">2.52</field>
  <field name="field_4">00:03:00</field>
</item>
<item>
...

or save it to a file:

In [22]: df.to_xml('foo.xml')

Obviously this example should be tweaked to fit your xml standard.

What would be a similar solution on dataframes? Not using pandas. — earl, Oct 14 '19 at 05:11

score 17 · Answer 3 · answered Feb 05 '17 at 16:09

You can use the xml.etree.ElementTree package to generate a read-friendly format in a very few lines of code.

root = etree.Element('data');

for i,row in dframe.iterrows():
    item = etree.SubElement(root, 'item', attrib=row.to_dict());

etree.dump(root);

This will create a XML Tree (under root), where each row will will be of type item, and have attributes for all columns. You can create a more nested tree with columns as well by creating a subelement for each field.

Then you can also read the xml file back in Python using the ElementTree package:

xml.etree.ElementTree.parse('xml_file.xml');

Nice! etree.dump(root) prints a string. How do I get a nice xml format with this solution? — peter_parker, May 31 '21 at 15:16

score 9 · Answer 4 · answered Mar 25 '21 at 09:28

9

As of v1.3, you can simply use:

df.to_xml()

answered Mar 25 '21 at 09:28

iacob

20,084
6
92
119

How do convert a pandas dataframe to XML?

4 Answers4

Linked