1

Hi I have started learning python and want to use it to do something to a XML file with.

I have been looking for information on the best course to follow but frankly I got a little lost. There are so many ways of manipulating XML files like ElementTree, lxml,minidom etc, etc, . Could someone point me into the right direction to go. Or point me to some code I can wrap my head around. I have started experimenting with lxml but haven't gotten any further then printing all elements yet.

Here is what I am trying to do :

  1. Read a line from the csv file. Load in Label and FullPath.
  2. Look in XML file for ITEM with mathing FullPath
  3. Change the FLAG1 for that ITEM to TRUE
  4. Change the FLAG2 and FLAG3 for that ITEM to FALSE
  5. Change the Label for that ITEM to the Label from the CSV file.
  6. Write out new.xml

Below is my xml structure. The two records below repeat like 10000 times in the file.

<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>  
    <Flag3>FALSE</Flag3>
    <Label>RED</Label> <<-2- After finding 1 I need to change THIS(only this)
    <Path>C:\\test\\</Path> <-1- I need to find this 
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>Blue</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>

So I have a ROOT : then lot of Elements: . Each of them have 7 SubElements.

This is what my CSV file looks like and what I need my output to look like : CSV File :

  Label;FullPath
  YELLOW;C:\\test\\test.png
  YELLOW;c:\\test\\test2\\blue.png

 <ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>FALSE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>TRUE</Flag3>
    <Label>YELLOW</Label>
    <Path>C:\\test\\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>FALSE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>TRUE</Flag3>
    <Label>YELLOW</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>

Pastebin link in case layout gets messed up :

http://pastebin.com/embed_js.php?i=QEx2ZGuY

I am trying ElementTree right now using this example : http://pymotw.com/2/xml/etree/ElementTree/parse.html. I have managed to search in the xml for a certain element name and print the contents. But I still do not see a way of finding a matching element on the same level.

from xml.etree import ElementTree
with open('mydata.xml', 'rt') as f:
    tree = ElementTree.parse(f)
#    filelist = ElementTree.ElementTree.find()
for node in tree.findall('.//file'):
    FileName = node.tag, node.text
    print FileName      

Output :

('file', 'test.png')
('file', 'blue.png')
twasbrillig
  • 17,084
  • 9
  • 43
  • 67
  • 1
    You can do what you want with any of the parsers you mentioned. Can you be more concrete about the problem you cannot get by ? – Bogdan Jan 30 '12 at 10:40

3 Answers3

1

Here's a quick example of how to do what I think you want using lxml.etree and xpath.

from cStringIO import StringIO
from lxml import etree

xmlfile = StringIO("""
<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>  
    <Flag3>FALSE</Flag3>
    <Label>RED</Label>
    <Path>C:\\test\\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>Blue</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>
""".strip())

datafile = StringIO("""
Label;FullPath
YELLOW;C:\\test\\test.png
YELLOW;c:\\test\\test2\\blue.png
""".strip())

# Read "csv". Simple, no error checking, skip first line.
filenameToLabel = {}
for l,f in (x.strip().split(';') for x in datafile.readlines()[1:]):
  filenameToLabel[f] = l

def first(seq,default=None):
  """xpath helper function"""
  for item in seq:
    return item
  return None

doc = etree.XML(xmlfile.read())

for item in doc.xpath('//ITEM'):
  item_filename = first(item.xpath('./Path/text()'),'').strip() + first(item.xpath('./file/text()'),'').strip()
  label = filenameToLabel.get(item_filename)
  if label is not None:
    first(item.xpath('./Flag1')).text = 'TRUE'
    first(item.xpath('./Flag2')).text = 'FALSE'
    first(item.xpath('./Flag3')).text = 'FALSE'
    first(item.xpath('./Label')).text = label

print etree.tostring(doc)

Yields

<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>YELLOW</Label>
    <Path>C:\test\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>YELLOW</Label>
    <Path>c:\test\test2\</Path>
    <file>blue.png</file>
  </ITEM>
</ThisIsMyData>
MattH
  • 37,273
  • 11
  • 82
  • 84
  • Ooh great thank you for the example ( working file ) the answer was in this line : item_filename = first(item.xpath('./Path/text()'),'').strip() + first(item.xpath('./file/text()'),'').strip() . – LessPythonic Jan 30 '12 at 15:19
0

First of all use python csv module to get your data from csv file. String split will just work fine if data is not big.

Than create your xml using etree.XML.

example:

>>>from lxml import etree
>>> csv_value = 'C:\\test\\'
>>> st = '<document>'+'<Flag1>FALSE</Flag1>' + '<Flag2>FALSE</Flag2>'+'<Path>' + csv_value + '</Path>' + '</document>'
>>> tree = etree.XML(st)
>>> etree.tostring(tree)
'<document><Flag1>FALSE</Flag1><Flag2>FALSE</Flag2><Path>C:\\test\\</Path></document>'

Fetching csv_value is left to you as an exercise.

Also take a look at this question.

Community
  • 1
  • 1
RanRag
  • 48,359
  • 38
  • 114
  • 167
  • Thank you for the answer, but I think I was too vague in describing what I am trying to do (probably because I am using the wrong terms). This just prints out a constructed XML. What I need to do is : <1> Look in the first XML file for the SubElements of ITEM containing the second string in the csv file. Then edit the SubElement on the same level named : – LessPythonic Jan 30 '12 at 11:11
0

I find that Beautiful Soup, and its sister, Beautiful Stone Soup, have really good, terse, example-based documentation that lends itself to diving in and trying things out on real world examples.

But, I've also heard that ElementTree is considered by some to be the gold standard in python.

Community
  • 1
  • 1
yurisich
  • 6,991
  • 7
  • 42
  • 63