I am using this Python script to convert CSV to XML. After conversion I see tags in the text (vim), which causes XML parsing error.
I am already tried answers from here, without success.
The converted XML file.
Thanks for any help!
Your input file has BOM (byte-order mark) characters, and Python doesn't strip them automatically when file is encoded in utf8. See: Reading Unicode file data with BOM chars in Python
>>> s = '\xef\xbb\xbfABC'
>>> s.decode('utf8')
u'\ufeffABC'
>>> s.decode('utf-8-sig')
u'ABC'
So for your specific case, try something like
from io import StringIO
s = StringIO(open(csvFile).read().decode('utf-8-sig'))
csvData = csv.reader(s)
Very terrible style, but that script is a hacked together script anyway for a one-shot job.
Change utf-8 to utf-8-sig
import csv with open('example.txt', 'r', encoding='utf-8-sig') as file:
Here's an example of a script that uses a real XML-aware library to run a similar conversion. It doesn't have the exact same output, but, well, it's an example -- salt to taste.
import csv
import lxml.etree
csvFile = 'myData.csv'
xmlFile = 'myData.xml'
reader = csv.reader(open(csvFile, 'r'))
with lxml.etree.xmlfile(xmlFile) as xf:
xf.write_declaration(standalone=True)
with xf.element('root'):
for row in reader:
row_el = lxml.etree.Element('row')
for col in row:
col_el = lxml.etree.SubElement(row_el, 'col')
col_el.text = col
xf.write(row_el)
To refer to the content of, say, row 2 column 3, you'd then use XPath like /row[2]/col[3]/text()
.