0

I have a csv file that contains the paths of the XML elements of an xml file that I need to change in the first column. The texts of each new xml file to be created are given in columns 2 - 10,000 on wards.

Path                                                            Edit1       Edit2       Edit3       Edit4       Edit5          ----  Edit1000
".//data/country[@name="Singapore"]/gdpnp[@month="08"]/state",  5.2e-015,   2e-05,      8e-06,      9e-04,      0.4e-05,   
".//data/country[@name="Peru"]/gdppc[@month="06"]/region",      0.04,       0.02,       0.15,       3.24,       0.98,                                                 

I would like to replace the text of the elements of the original XML file (NoEdit.xml), based on the paths in column 1, by values in each subsequent column and name accordingly e.g. XML based on column 2 values will be named Edit2.xml.

import csv
import xml.etree.ElementTree as ET
tree = ET.parse('NoEdit.xml')      
with open('csvlist.csv', 'rb') as csvlist:
    reader = csv.reader(csvlist, delimiter=',')
for x in range(1, 1000):
    for row in reader:
        if reader.line_num == 1: continue # skip the row of headers
        for data in tree.findall(row[0]):
            data.text = row[(x)]
            tree.write('Edit(x).xml')

Based on help on this forum q1 q2 I have gotten this far @ the code below. I get the errors KeyError: '".//data/country[@name="'. when I use a fixed path I still get error on findall or I just don't get the right xml.

I would appreciate any help regards direction with this. Please feel free to suggest alternate methods of doing this as well.

Community
  • 1
  • 1
Mia
  • 171
  • 1
  • 12
  • `findall()` does not support the entirety of the XPath language. Use `lxml.etree` and the `xpath()` call. – Charles Duffy Jul 27 '15 at 16:06
  • It's important to distinguish between syntactic and literal quotes: Syntactic quotes are quotes that are used by the language at hand as syntax elements (ie. to decide what is inside of a string), but not actually a string itself; literal quotes are part of the data. If you write `"foo"` in Python, the strong's literal contents are just `foo`; the quotes are syntactic. Similarly, if you want a CSV cell with the same contents as Python's `".//yadda/yadda"`, it's just `.//yadda/yadda`. – Charles Duffy Jul 27 '15 at 16:56

1 Answers1

1

This is not valid CSV:

".//data/country[@name="Singapore"]/gdpnp[@month="08"]/state",

Instead, it should be:

".//data/country[@name=""Singapore""]/gdpnp[@month=""08""]/state",

Notably, any literal " in the data needs to be doubled, to "", to disambiguate it from the ending quotes. (I'm curious how you created that file -- any spreadsheet program or other CSV generator should have gotten it right).


I would also strongly suggest using lxml.etree here and its .xpath() call; .findall() is not real XPath.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • The paths shown are pre CSV save, saving it as csv converts it to double quotes. I still get the error. I will start looking at lxml. Thanks – Mia Jul 27 '15 at 16:38
  • The exact format when opened with notepad is `""".//world/region[@name=""USA""]/demographics/populationMiniCAM[@year=""2020""]/totalPop"""` – Mia Jul 27 '15 at 16:41
  • ...so there are **literal quotes** around the string? Ditch those; they're wrong. In short -- make the string exactly what I showed in my answer. :) – Charles Duffy Jul 27 '15 at 16:42
  • Thanks, i'll do this and get back – Mia Jul 27 '15 at 16:43
  • Thanks a lot, the error is gone now. Not to be a drag, but any advice on the loop to create multiple xmls. I am still having a problem with the loop, it seems python creates the new xml file and overwrites it immediately. – Mia Jul 27 '15 at 17:09
  • I'd suggest `'Edit(%s).xml' % x`. – Charles Duffy Jul 27 '15 at 17:20
  • Thanks again. I tried ur suggestion but get an `ValueError: I/O operation on closed file` error. – Mia Jul 27 '15 at 18:07
  • Sorry, pls ignore that, it was an indentation error – Mia Jul 27 '15 at 18:14
  • Thanks again, i'm trying to push my luck here. The dynamic naming works but the code only returns Edit(1).xml. it is created but subsequently overwritten by a new Edit(1).xml file. I'm thinking i need to combine the two for loops. Any thoughts on this sir? U have been more than helpful – Mia Jul 27 '15 at 19:29
  • Sounds like you should be opening the output file in the outer loop, so it's already open before the inner loop starts. – Charles Duffy Jul 27 '15 at 19:31
  • Still getting error `[Errno 22] invalid mode ('wb') or filename:'core..(1).xml'` I edited code to this based on ur guidance. `import csv import xml.etree.ElementTree as ET tree = ET.parse('core_model_input_2010_05_03.xml') with open('csvlist2.csv', 'rb') as csvlist2: reader = csv.reader(csvlist2, delimiter=',') for x in range(1, 1000): for col in reader: if reader.line_num == 1: continue # skip the row of headers for data in tree.findall(col[0]): data.text = col[(x)] tree.write('core_model_input_2010_05_03(%s).xml' % x)` – Mia Jul 28 '15 at 18:34
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/84503/discussion-between-mia-and-charles-duffy). – Mia Jul 28 '15 at 19:32