0

Hello I'm new to Python,

and I would like to convert a .csvfile to XML. The desired output should look like, where I would like to have each individual ID within a Node: <employee id="5"> and the variables corresponding to each individual beneath each other rather then on the same line:

<?xml version="1.0" encoding="UTF-8"?>
<Document>
  <employee id="1">
    <Name>Steve</Name>
    <City>Boston</City>
    <Age>33</Age>
  </employee>
  <employee id="2">
    <Name>Michael</Name>
    <City>Dallas</City>
    <Age>45</Age>
  </employee>
  <employee id="3">
    <Name>John</Name>
    <City>New York</City>
    <Age>89</Age>
  </employee>
  <employee id="4">
    <Name>Thomas</Name>
    <City>LA</City>
    <Age>62</Age>
  </employee>
  <employee id="5">
    <Name>Clint</Name>
    <City>Paris</City>
    <Age>30</Age>
  </employee>
</Document>

Given some data:

import pandas
ID = pandas.DataFrame([1,2,3,4,5])
name = pandas.DataFrame(["Steve","Michael","John","Thomas","Clint"])
city = pandas.DataFrame(["Boston","Dallas","New York","LA","Paris"])
Age = pandas.DataFrame([45,33,33,20,50])
df = pandas.concat([ID, name,city,Age], axis=1)
df.columns = ['ID','name','city','Age']

df

    ID  name    city    Age

0   1   Steve   Boston  45
1   2   Michael Dallas  33
2   3   John    New York    33
3   4   Thomas  LA  20
4   5   Clint   Paris   50

And the conversion from .csv to XML:

import csv

csvFile = 'df.csv'
xmlFile = 'myData.xml'

csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Document>' + "\n")

rowNum = 0
for employee in csvData:
    if rowNum == 0:
       tags = employee 
       # replace spaces w/ underscores in tag names
       for i in range(len(tags)):
           tags[i] = tags[i].replace(' ', '_')
    else: 
       xmlData.write('<employee >' + "\n")
       for i in range(len(tags)):
           xmlData.write('    ' + '<' + tags[i] + '>' \
                        + employee [i] + '</' + tags[i] + '>' + "\n")
       xmlData.write('</employee >' + "\n")

    rowNum +=1

xmlData.write('</Document>' + "\n")
xmlData.close()

Output XML which looks a bit off as desired:

<<?xml version="1.0"?>
<Document>
<employee>
    <X>1</X>
    <ID>1</ID>
    <Name>Steve</Name>
    <City>Boston</City>
    <Age>33</Age>
</employee>
<employee>
    <X>2</X>
    <ID>2</ID>
    <Name>Michael</Name>
    <City>Dallas</City>
    <Age>45</Age>
</employee>
<employee>
    <X>3</X>
    <ID>3</ID>
    <Name>John</Name>
   <City>New York</City>
    <Age>89</Age>
</employee>
<employee>
    <X>4</X>
    <ID>4</ID>
    <Name>Thomas</Name>
    <City>LA</City>
    <Age>62</Age>
</employee>
 <employee>
    <X>5</X>
    <ID>5</ID>
    <Name>Clint</Name>
    <City>Paris</City>
    <Age>30</Age>
</employee>
</Document>
Googme
  • 914
  • 7
  • 27
  • Duplicate of [pandas dataframe to xml](http://stackoverflow.com/questions/31260151/python-pandas-dataframe-to-xml) – aluriak Sep 01 '16 at 14:49
  • 3
    Hi @aluriak, thank you for that hint. However, the only duplicate are the topics aka XML and csv not the question, at least from my perspective – Googme Sep 01 '16 at 14:52
  • Hello, how about this answer? http://stackoverflow.com/a/18576067/6313992 – Tomasz Plaskota Sep 01 '16 at 15:01
  • Don't create XML files manually! Always use an XML parser: https://docs.python.org/2/library/xml.etree.elementtree.html – mavroprovato Sep 01 '16 at 15:04

2 Answers2

2

You need to specify the delimiter of the csv file when creating the csv reader object (default is ',').

csvData = csv.reader(open(csvFile), delimiter=' ')

If this is not given, then the entries of tags are not in the format you want.


The else section in your for loop is not correct. This should be the solution:

import csv

csvFile = 'df.csv'
xmlFile = 'myData.xml'

csvData = csv.reader(open(csvFile), delimiter=';')
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Document>' + "\n")

rowNum = 0
for employee in csvData:
    if rowNum == 0:
       tags = employee 
       # replace spaces w/ underscores in tag names
       for i in range(len(tags)):
           tags[i] = tags[i].replace(' ', '_')
    else: 
       xmlData.write('<employee ' + tags[0] + '="' + employee[0] + '" >' + "\n")
       for i in range(1,len(tags)):
           xmlData.write('    ' + '<' + tags[i] + '>' \
                        + employee [i] + '</' + tags[i] + '>' + "\n")
       xmlData.write('</employee >' + "\n")

    rowNum +=1

xmlData.write('</Document>' + "\n")
xmlData.close()
MaBe
  • 91
  • 5
  • Hi @MaBe, I edited the question. Thank you for that hint. However the main problem remains to create a node beginning with and so on – Googme Sep 01 '16 at 15:14
2

Using an XML parser will be far easier. Here is your example using the xml.etree.ElementTree module. I assumed that you converted the dataframe to csv with df.to_csv('df.csv')

import csv
import xml.etree.ElementTree as ET

csvFile = 'df.csv'
csvData = csv.reader(open(csvFile))

root = ET.Element('Document')
next(csvData)  # skip header
for _, employee_id, name, city, age in csvData:
    employee_elem = ET.SubElement(root, "Employee")
    employee_elem.set('id', employee_id)  # set attribute

    # Child elements
    name_elem = ET.SubElement(employee_elem, "Name")
    name_elem.text = name
    city_elem = ET.SubElement(employee_elem, "City")
    city_elem.text = city
    age_elem = ET.SubElement(employee_elem, "Name")
    age_elem.text = age

ET.ElementTree(root).write('df.xml', encoding='utf-8', xml_declaration=True)
mavroprovato
  • 8,023
  • 5
  • 37
  • 52
  • Hi @mavroprovato thank you for your time! May I ask you one final question. How would you save the `XML` with appropriate `UTF-8` encoding, as in my example above, by adding `xmlFile = 'myData.xml'` `xmlData.close()` and `xmlData.write(?xml version="1.0" encoding="UTF-8"?\>')` at the end of the document or beginning? – Googme Sep 01 '16 at 16:03
  • See my updated answer. The documentation for the `write` method is here: https://docs.python.org/3.5/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree.write – mavroprovato Sep 01 '16 at 16:14