2

Dataframe to xml in python without iteration?

Input Dataframe:

    A   B   C   D
    aa  ab  ac  ad
    aaa abb acc add

Output in XML:

    <A>aa</A>
    <B>ab</B>
    <C>ac</C>
    <D>ad</D>
    <A>aaa</A>
    <B>abb</B>
    <C>acc</C>
    <D>add</D>

1 Answers1

2

given dataframe x:

>>> import pandas as pd
>>> x = pd.DataFrame([['aa','ab','ac','ad'],['aaa','abb','acc','add']],columns=['A','B','C','D'])
>>> x
     A    B    C    D
0   aa   ab   ac   ad
1  aaa  abb  acc  add

You can use this function. However, there is no guarantee that no loops are done internally in pandas and numpy functions used here.

>>> import numpy as np
>>> def to_xml(df):
...     
...     #extract columns and repeat them by number of rows
...     cols = df.columns.tolist()*len(df.index)
...     
...     #convert df to numpy and reshape columns to one vector
...     df_numpy = np.array(df)
...     df_numpy = df_numpy.reshape(np.dot(*df_numpy.shape))
...     
...     #convert columns and numpy array to pandas and apply function that formats each row, convert to list
...     listlike = pd.DataFrame([df_numpy,cols]).apply(lambda x: '<{0}>{1}</{0}>'.format(x[1],x[0])).tolist()
...    
...     #return list of rows joined with newline character
...     return '\n'.join(listlike)

output:

>>> print(to_xml(x))
<A>aa</A>
<B>ab</B>
<C>ac</C>
<D>ad</D>
<A>aaa</A>
<B>abb</B>
<C>acc</C>
<D>add</D>
Jan Musil
  • 508
  • 5
  • 15
  • Hi @Jan Musil, Actually your program is giving the output. But the column headers are duplicated for the full dataset. `cols = df.columns.tolist()*len(df.index)` It will increase the memory and program will become slow for huge data. If it is not possible I am fine with that – Rajeshkanna Purushothaman Jun 27 '20 at 11:09
  • It should be still very fast and not increasing memory usage I believe. – Jan Musil Jun 27 '20 at 22:09