Lost in XML and Python

Question

Hi I have started learning python and want to use it to do something to a XML file with.

I have been looking for information on the best course to follow but frankly I got a little lost. There are so many ways of manipulating XML files like ElementTree, lxml,minidom etc, etc, . Could someone point me into the right direction to go. Or point me to some code I can wrap my head around. I have started experimenting with lxml but haven't gotten any further then printing all elements yet.

Here is what I am trying to do :

Read a line from the csv file. Load in Label and FullPath.
Look in XML file for ITEM with mathing FullPath
Change the FLAG1 for that ITEM to TRUE
Change the FLAG2 and FLAG3 for that ITEM to FALSE
Change the Label for that ITEM to the Label from the CSV file.
Write out new.xml

Below is my xml structure. The two records below repeat like 10000 times in the file.

<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>  
    <Flag3>FALSE</Flag3>
    <Label>RED</Label> <<-2- After finding 1 I need to change THIS(only this)
    <Path>C:\\test\\</Path> <-1- I need to find this 
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>Blue</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>

So I have a ROOT : then lot of Elements: . Each of them have 7 SubElements.

This is what my CSV file looks like and what I need my output to look like : CSV File :

  Label;FullPath
  YELLOW;C:\\test\\test.png
  YELLOW;c:\\test\\test2\\blue.png

 <ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>FALSE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>TRUE</Flag3>
    <Label>YELLOW</Label>
    <Path>C:\\test\\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>FALSE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>TRUE</Flag3>
    <Label>YELLOW</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>

Pastebin link in case layout gets messed up :

http://pastebin.com/embed_js.php?i=QEx2ZGuY

I am trying ElementTree right now using this example : http://pymotw.com/2/xml/etree/ElementTree/parse.html. I have managed to search in the xml for a certain element name and print the contents. But I still do not see a way of finding a matching element on the same level.

from xml.etree import ElementTree
with open('mydata.xml', 'rt') as f:
    tree = ElementTree.parse(f)
#    filelist = ElementTree.ElementTree.find()
for node in tree.findall('.//file'):
    FileName = node.tag, node.text
    print FileName

Output :

('file', 'test.png')
('file', 'blue.png')

You can do what you want with any of the parsers you mentioned. Can you be more concrete about the problem you cannot get by ? — Bogdan, Jan 30 '12 at 10:40

score 1 · Accepted Answer · answered Jan 30 '12 at 13:21

Here's a quick example of how to do what I think you want using lxml.etree and xpath.

from cStringIO import StringIO
from lxml import etree

xmlfile = StringIO("""
<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>  
    <Flag3>FALSE</Flag3>
    <Label>RED</Label>
    <Path>C:\\test\\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>Blue</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>
""".strip())

datafile = StringIO("""
Label;FullPath
YELLOW;C:\\test\\test.png
YELLOW;c:\\test\\test2\\blue.png
""".strip())

# Read "csv". Simple, no error checking, skip first line.
filenameToLabel = {}
for l,f in (x.strip().split(';') for x in datafile.readlines()[1:]):
  filenameToLabel[f] = l

def first(seq,default=None):
  """xpath helper function"""
  for item in seq:
    return item
  return None

doc = etree.XML(xmlfile.read())

for item in doc.xpath('//ITEM'):
  item_filename = first(item.xpath('./Path/text()'),'').strip() + first(item.xpath('./file/text()'),'').strip()
  label = filenameToLabel.get(item_filename)
  if label is not None:
    first(item.xpath('./Flag1')).text = 'TRUE'
    first(item.xpath('./Flag2')).text = 'FALSE'
    first(item.xpath('./Flag3')).text = 'FALSE'
    first(item.xpath('./Label')).text = label

print etree.tostring(doc)

Yields

<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>YELLOW</Label>
    <Path>C:\test\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>YELLOW</Label>
    <Path>c:\test\test2\</Path>
    <file>blue.png</file>
  </ITEM>
</ThisIsMyData>

Ooh great thank you for the example ( working file ) the answer was in this line : item_filename = first(item.xpath('./Path/text()'),'').strip() + first(item.xpath('./file/text()'),'').strip() . — LessPythonic, Jan 30 '12 at 15:19

score 0 · Answer 2 · edited May 23 '17 at 11:51

0

First of all use python csv module to get your data from csv file. String split will just work fine if data is not big.

Than create your xml using etree.XML.

example:

>>>from lxml import etree
>>> csv_value = 'C:\\test\\'
>>> st = '<document>'+'<Flag1>FALSE</Flag1>' + '<Flag2>FALSE</Flag2>'+'<Path>' + csv_value + '</Path>' + '</document>'
>>> tree = etree.XML(st)
>>> etree.tostring(tree)
'<document><Flag1>FALSE</Flag1><Flag2>FALSE</Flag2><Path>C:\\test\\</Path></document>'

Fetching csv_value is left to you as an exercise.

Also take a look at this question.

edited May 23 '17 at 11:51

Community

1
1

answered Jan 30 '12 at 10:54

RanRag

48,359
38
114
167

Thank you for the answer, but I think I was too vague in describing what I am trying to do (probably because I am using the wrong terms). This just prints out a constructed XML. What I need to do is : <1> Look in the first XML file for the SubElements of ITEM containing the second string in the csv file. Then edit the SubElement on the same level named : to the first string in the CSV file. All tutorials I have found lxml are about finding certain Elements of a ROOT but none show how to find a subelement of a element of a root containing a certain string. – LessPythonic Jan 30 '12 at 11:11

score 0 · Answer 3 · edited May 23 '17 at 12:15

0

I find that Beautiful Soup, and its sister, Beautiful Stone Soup, have really good, terse, example-based documentation that lends itself to diving in and trying things out on real world examples.

But, I've also heard that ElementTree is considered by some to be the gold standard in python.

edited May 23 '17 at 12:15

Community

1
1

answered Jan 30 '12 at 11:53

yurisich

6,991
7
42
63

Updated Question with more info and progress on ElementTree per Droogans suggestion – LessPythonic Jan 30 '12 at 12:53

Lost in XML and Python

3 Answers3

Linked