5

I am trying to parse iTunes Playlist by using iterparse() of ElementTree but getting following error:

AttributeError: 'Element' object has no attribute 'xpath'

Code is given below:

import xml.etree.ElementTree as ET
context = ET.iterparse(file,events=("start", "end"))
    # turn it into an iterator
    context = iter(context)
    # get the root element
    event, root = context.next()
    for event, elem in context:
        z = elem.xpath(".//key")
        elem.clear()
        root.clear()
    print z

What I am doing wrong? File is too big so I have to use iterparse() anyway.

Pedro Romano
  • 10,973
  • 4
  • 46
  • 50
Volatil3
  • 14,253
  • 38
  • 134
  • 263
  • Try `elem.findall(".//key")` instead of `elem.xpath(".//key")`. – Pedro Romano Nov 19 '12 at 14:57
  • `elem.clear()`, among other things, removes all sub-elements. Have you tried removing `elem.clear()` and `root.clear()` and putting `print z` **inside** the `for` loop? – Pedro Romano Nov 19 '12 at 15:54
  • Ok yes it returns within loop when I commented the clear() code. What do you suggest? – Volatil3 Nov 19 '12 at 16:05
  • Why do you need the `clear()` method calls? – Pedro Romano Nov 19 '12 at 16:07
  • I read here: http://effbot.org/zone/element-iterparse.htm It is for lxml. I thought it will be issue for elementTree in built-in library too. Check under heading **Incremental Parsing** – Volatil3 Nov 19 '12 at 16:10
  • I understand. However, you will need to do **all** the processing you need with `z` **before** you call `clear()` because it will be lost after that. – Pedro Romano Nov 19 '12 at 16:14
  • This is plist XML: http://pastebin.com/gU5tX9nR I have to play with each dict block to get relevant song information. I have now changed findall() to **z = elem.findall(".//dict/dict/key")**, checking it now – Volatil3 Nov 19 '12 at 16:19
  • Ok sorted it out: Here is the updated code: http://pastie.org/5401946 Love you bro! Can you make it an ANSWER so that I can accept? – Volatil3 Nov 19 '12 at 16:52
  • One issue I found with ElementTree that it works _sequentially_. The pList document(http://pastebin.com/gU5tX9nR) contains tags like _keys_ , _integer_ and _string_. There's no way to fetch particular string/integer tag by doing some *next* operation. How do I tackle tht? – Volatil3 Nov 22 '12 at 11:29

1 Answers1

2

xml.etree.ElementTree provides limited support for XPath expressions for its Element class find, findall and findtext methods (there's no xpath method: that's why you are getting an error).

Also, if you call clear() on an element to conserve used memory, you need to do it only after you've finished processing the element and all its children.

Therefore, you need to to change your code to something similar to the following:

for event, elem in context:
    for child in elem.findall(".//key"):
        # process child
    elem.clear()
    root.clear()
Pedro Romano
  • 10,973
  • 4
  • 46
  • 50