2

I know this question has been asked before and my last was put on hold, so now I'm specifying it detailed. I have a CSV file of population information, I read it to pandas and now have to transform it to XML, for example like this
<?xml version="1.0" encoding="utf-8"?> <populationdata> <municipality> <name> Akaa </name> <year> 2014 </year> <total> 17052 ......

This is the reading part of my code:
import pandas as pd pop = pd.read_csv(r'''directory\population.csv''', delimiter=";")

Tried doing it like in mentioned before in the link here with function and cycle: How do convert a pandas/dataframe to XML?. Haven't succeeded, any other recommendations maybe?

This is an example of my dataframe:
Alahärmä 2014 0 0.1 0.2 0 Alajärvi 2014 10171 5102 5069 1 Alastaro 2014 0 0 0 2 Alavieska 2014 2687 1400 1287 3 Alavus 2014 12103 6102 6001 4 Anjalankoski 2014 0 0 0

Fairly new to python, so any help is apreciated.

Tautvy Da
  • 75
  • 1
  • 9
  • Could you show us a head() of your dataframe – Paula Livingstone Nov 07 '17 at 12:14
  • this is what it showed: `Alahärmä 2014 0 0.1 0.2 0 Alajärvi 2014 10171 5102 5069 1 Alastaro 2014 0 0 0 2 Alavieska 2014 2687 1400 1287 3 Alavus 2014 12103 6102 6001 4 Anjalankoski 2014 0 0 0` – Tautvy Da Nov 07 '17 at 12:19
  • Does this answer your question? [How do convert a pandas/dataframe to XML?](https://stackoverflow.com/questions/18574108/how-do-convert-a-pandas-dataframe-to-xml) – iacob Mar 25 '21 at 09:26

1 Answers1

1

The question you have linked to actually has a great answer to your question but I guess you’re having difficulty transposing your data into that solution so Ive done it below for you.

Ok your level of detail is a bit sketchy. If your specific situation differs slightly then you'll need to tweak my answer but heres something that works for me:

First off assuming you have a text file as follows :

0       Alahärmä  2014      0   0.1   0.2
1      Alajärvi  2014  10171  5102  5069
2      Alastaro  2014      0     0     0
3     Alavieska  2014   2687  1400  1287
4        Alavus  2014  12103  6102  6001
5  Anjalankoski  2014      0     0     0

Moving on to creating the python script, we first import that text file using the following line:

pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', 'females'])

This brings in the text file as a dataframe and gives the new dataframe the correct column headers.

Then taking the data from the question you linked to, we add the following to our python script:

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

Now we put it all together and we get the below:

import pandas as pd
pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', 'females'])

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

When we run the above file we get the following output:

<item>
  <field name="cityname">Alahärmä</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.1</field>
  <field name="females">0.2</field>
</item>
<item>
  <field name="cityname">Alajärvi</field>
  <field name="year">2014</field>
  <field name="total">10171</field>
  <field name="male">5102.0</field>
  <field name="females">5069.0</field>
</item>
<item>
  <field name="cityname">Alastaro</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.0</field>
  <field name="females">0.0</field>
</item>
<item>
  <field name="cityname">Alavieska</field>
  <field name="year">2014</field>
  <field name="total">2687</field>
  <field name="male">1400.0</field>
  <field name="females">1287.0</field>
</item>
<item>
  <field name="cityname">Alavus</field>
  <field name="year">2014</field>
  <field name="total">12103</field>
  <field name="male">6102.0</field>
  <field name="females">6001.0</field>
</item>
<item>
  <field name="cityname">Anjalankoski</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.0</field>
  <field name="females">0.0</field>
</item>
Winand
  • 2,093
  • 3
  • 28
  • 48
Paula Livingstone
  • 1,175
  • 1
  • 12
  • 21
  • You are gold! I was getting few errors, because of brackets or ' in wrong place. Thanks! – Tautvy Da Nov 07 '17 at 13:55
  • Last question: how about would I be able to save this output to xml file? – Tautvy Da Nov 07 '17 at 14:59
  • If it still helps you or someone: import lxml.etree as ET parser = ET.XMLParser(recover=True) tree = ET.ElementTree(ET.fromstring(xml, parser=parser)) tree.write(filelocation) – Anu Oct 21 '20 at 03:23