1

I am trying to create a xml file from a csv

CSV:

CatOne, CatTwo, CatThree
ProdOne, ProdTwo, ProdThree
ProductOne, ProductTwo, ProductThree

Desired XML:

<root>
  <prod>
    <CatOne>ProdOne</CatOne>
    <CatTwo>ProdTwo</CatTwo>
    <CatThree>ProdThree</CatThree>
  </prod>
  <prod>
    <CatOne>ProductOne</CatOne>
    <CatTwo>ProductTwo</CatTwo>
    <CatThree>ProductThree</CatThree>
  </prod>
</root>

Here is my code:

#! usr/bin/python
# -*- coding: utf-8 -*-
import csv, sys, os
from lxml import etree

def main():
    csvFile = 'test.csv'
    xmlFile = open('myData.xml', 'w')
    csvData = csv.reader(open(csvFile), delimiter='\t')
    header = csvData.next()
    details = csvData.next()
    details2 = csvData.next()
    root = etree.Element('root')

    prod = etree.SubElement(root,'prod')
    for index in range(0, len(header)):
        child = etree.SubElement(prod, header[index])
        child.text = details[index]
        prod.append(child)   
    prod = etree.SubElement(root,'prod')
    for index in range(0, len(header)):
        child = etree.SubElement(prod, header[index])
        child.text = details2[index]
        prod.append(child)      
    result = etree.tostring(root, pretty_print=True)
    xmlFile.write(result)  

if __name__ == '__main__':
    main()

I am getting the desired output, but the way I am doing it, is really shitty. I'd like to have it in some generic way and I believe it is possible much more pythonic But I can't figure out how to do this. The code should also work, if the csv has 10 or even 20 lines.

Thanks for help

xhallix
  • 2,919
  • 5
  • 37
  • 55

1 Answers1

5

Ok I found out how to solve it.

I will answer my own question here, it might be help someone else I hope

#! usr/bin/python
# -*- coding: utf-8 -*-
import csv, sys, os
from lxml import etree

def main():
    csvFile = 'test.csv'
    xmlFile = open('myData.xml', 'w')
    csvData = csv.reader(open(csvFile), delimiter='\t')

    header = csvData.next()
    counter = 0
    root = etree.Element('root')

    for row in csvData:
        prod = etree.SubElement(root,'prod')
        for index in range(0, len(header)):
            child = etree.SubElement(prod, header[index])
            child.text = row[index].decode('utf-8')
            prod.append(child

    result = etree.tostring(root, pretty_print=True)
    xmlFile.write(result)

if __name__ == '__main__':
    main()
xhallix
  • 2,919
  • 5
  • 37
  • 55
  • 1
    you could use `for cat, prod in zip(headers, row):` instead of `for index in range ...`. – jfs Nov 19 '13 at 06:43
  • Thank you for adding this comment, I will take a look how to deal with the header[index] then because I don't see how to change this at first glance – xhallix Nov 19 '13 at 09:05
  • `cat == header[index]`. See, [how `zip()` works](http://docs.python.org/2/library/functions.html#zip). I meant: `for category, prod_text in zip(headers, row)` to avoid conflict with `prod`. – jfs Nov 20 '13 at 01:43
  • your call to prod.append(child) ends up creating two links to the child and results in duplicate child elements in the XML output. Child is already linked to prod as you create it as a subElement already – nrjohnstone Mar 30 '16 at 02:50