0

I'm having a weird MemoryError while parsing XML via minidom (ran on a server, file-path changed):

Traceback (most recent call last):
  File "python.py", line 19, in <module>
    xmldoc = minidom.parseString(unicode(data,errors='ignore'))
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 1928, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 940, in parseString
    return builder.parseString(string)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 753, in start_element_handler
    _append_child(self.curNode, node)
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 287, in _append_child
    last.__dict__["nextSibling"] = node
MemoryError

The xml-feed I'm parsing is huge, so that might be the problem. But what to do about it?

Martti Laine
  • 12,655
  • 22
  • 68
  • 102
  • Yes, you are running out of memory. Maybe another xml parser would use less memory. – Justin Peel Feb 11 '12 at 17:14
  • I recommend [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/). – Blender Feb 11 '12 at 17:15
  • Seems like this is the issue. Trying different options now. Justin, feel free to post as an answer so I can accept it. – Martti Laine Feb 11 '12 at 18:17
  • @Blender: BeautifulSoup 3 uses regular expressions internally, and will likely require as much memory as minidom, probably even more. BeautifulSoup 4 uses lxml for XML parsing, anyway, so it's just an additional dependency and an additional layer of complexity with little gain. –  Feb 11 '12 at 18:42
  • I'm using `iterparse` with help from [here](http://www.ibm.com/developerworks/xml/library/x-hiperfparse/). I already posted another question regarding encoding [here](http://stackoverflow.com/questions/9243005/ignore-encoding-errors-in-python-iterparse) =) – Martti Laine Feb 11 '12 at 18:51

0 Answers0