MemoryError on Python minidom

Question

I'm having a weird MemoryError while parsing XML via minidom (ran on a server, file-path changed):

Traceback (most recent call last):
  File "python.py", line 19, in <module>
    xmldoc = minidom.parseString(unicode(data,errors='ignore'))
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 1928, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 940, in parseString
    return builder.parseString(string)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 753, in start_element_handler
    _append_child(self.curNode, node)
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 287, in _append_child
    last.__dict__["nextSibling"] = node
MemoryError

The xml-feed I'm parsing is huge, so that might be the problem. But what to do about it?

Yes, you are running out of memory. Maybe another xml parser would use less memory. — Justin Peel, Feb 11 '12 at 17:14
I recommend [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/). — Blender, Feb 11 '12 at 17:15
Seems like this is the issue. Trying different options now. Justin, feel free to post as an answer so I can accept it. — Martti Laine, Feb 11 '12 at 18:17
@Blender: BeautifulSoup 3 uses regular expressions internally, and will likely require as much memory as minidom, probably even more. BeautifulSoup 4 uses lxml for XML parsing, anyway, so it's just an additional dependency and an additional layer of complexity with little gain. — , Feb 11 '12 at 18:42
I'm using `iterparse` with help from [here](http://www.ibm.com/developerworks/xml/library/x-hiperfparse/). I already posted another question regarding encoding [here](http://stackoverflow.com/questions/9243005/ignore-encoding-errors-in-python-iterparse) =) — Martti Laine, Feb 11 '12 at 18:51

MemoryError on Python minidom

0 Answers0