1

I have a csv file (csvlist.csv) that contains the paths of the XML elements I need to change in the first column. The texts I need to change original node values to are given in columns 2 - 10,000 on wards.

Path                                                            Edit1       Edit2       Edit3       Edit4       Edit5          ----  Edit1000
".//data/country[@name="Singapore"]/gdpnp[@month="08"]/state",  5.2e-015,   2e-05,      8e-06,      9e-04,      0.4e-05,   
".//data/country[@name="Peru"]/gdppc[@month="06"]/region",      0.04,       0.02,       0.15,       3.24,       0.98,                                                 

I would like to replace the text of the elements of the original XML file (NoEdit.xml), based on the paths in column 1, by values in each subsequent column and name accordingly e.g. XML based on column 2 values will be named Edit2.xml.

import csv
import xml.etree.ElementTree as ET
tree = ET.parse('NoEdit.xml')      
with open('csvlist.csv', 'rb') as csvlist:
    reader = csv.reader(csvlist, delimiter=',')
for x in range(1, 1000):
    for row in reader:
        if reader.line_num == 1: continue # skip the row of headers
        for data in tree.findall(row[0]):
            data.text = row[(x)]
            tree.write('Edit(x).xml')

Based on help on this forum q1 q2 I have gotten this far @ the code below. I get the error:

KeyError: '".//data/country[@name="'

When I give a fixed path to remove this error I still get error on findall or I just don't get the right xml.

I would appreciate any help regards direction with this. Please feel free to suggest alternate methods of doing this as well. Thanks

Community
  • 1
  • 1
Mia
  • 171
  • 1
  • 12
  • Is this even the right approach? – Mia Jul 27 '15 at 15:11
  • You need to properly escape, remove, or convert double to single quotes in `findall(xpath)`. Try removing first and last quotes in csv column using find+replace. Or alter attribute tag quotes with single quotes or escape them with backslash. – Parfait Jul 29 '15 at 03:50
  • I made the edit. I still get error: [Errno 22] invalid mode ('wb') or filename:'Edit(1).xml' – Mia Jul 29 '15 at 04:30
  • Format the file name with `x` variable: `tree.write('Edit({0}).xml'.format(x))` – Parfait Jul 29 '15 at 18:11

1 Answers1

1

First of all, you should provide a reproducible example, to help other users to help you. I've done some of the work for you and I've created a test csv file, like:

Path,                                             Edit1,       Edit2
".//first",  5.2e-015,   2e-05
".//second",      0.04,       0.02

Note that I added commas to the header, because you missed them, and I don't know if it was intentional or typing error.

Also I've created a small xml file:

<root>
    <first>1</first>
    <second>2</second>
    <third>3</third>
</root>

And the script:

import csv
from lxml import etree
import sys

xmldata = open(sys.argv[2], newline='').read();

with open(sys.argv[1], newline='') as csvfile:
    for i, pivoted_row in enumerate(zip(*csv.reader(csvfile, delimiter=','))):
        if i == 0:
            xpaths = pivoted_row
            continue
        pivoted_row = [c.strip() for c in pivoted_row]
        tree = etree.fromstring(xmldata)
        with open(pivoted_row[0] + ".xml", 'wb') as outfile:
            for j in range(1, len(xpaths)):
                tree.xpath(xpaths[j])[0].text = pivoted_row[j]
            outfile.write(etree.tostring(tree))

The key part is to pivot the csv so I can handle all the data to write to the same file at once. So this way the first column (xpaths) will be first row, so I save them in a variable, that traverse for each other row.

Run it like:

python3 script.py csvfile xmlfile

It creates two files, Edit1.xml and Edit2.xml, with following content:

==> Edit1.xml <==
<root>
    <first>5.2e-015</first>
    <second>0.04</second>
    <third>3</third>
</root>
==> Edit2.xml <==
<root>
    <first>2e-05</first>
    <second>0.02</second>
    <third>3</third>
</root>

I hope it can be useful and set you in the good way to solve your problem.

Birei
  • 35,723
  • 2
  • 77
  • 82
  • Thanks @Birei Do you any idea if it is possible to find a faster implementation. When I remove the `, newline=''` from the `open` statement, the code works but the run time is very slow for want i want to do. Do you think ElementTree would be faster? – Mia Jul 29 '15 at 20:26
  • @Mia: `lxml` is pretty fast, I don't think that `ElementTree` will perform much better, but not sure, so your best bet is to test it. Besides that, you could try to read the `xml` data once, and reuse a modified tree for each next row. It shouldn't mind if same elements will be changed every iteration, so moving `tree = etree.fromstring(xmldata)` after first `with` sentence. – Birei Jul 29 '15 at 21:39
  • Thanks, I'll implement elementtree and get back. I took `tree = etree.fromstring(xmldata)` out of the loop and after the first `with` statement actually made it slower than before. – Mia Jul 30 '15 at 06:43