4

I am trying to open an xml file and parse it, but when I try to open it the file never seems to open at all it just keeps running, any ideas?

from xml.dom import minidom
Test_file = open('C::/test_file.xml','r')
xmldoc = minidom.parse(Test_file)

Test_file.close()

for i in xmldoc:
     print('test')

The file is 180.288 KB, why does it never make it to the print portion?

Trying_hard
  • 8,931
  • 29
  • 62
  • 85
  • 2
    Why do you have two colons, try `r'C://test_file.xml'`. – alecxe Sep 16 '13 at 18:02
  • still doesn't get to print when I change it to this – Trying_hard Sep 16 '13 at 18:10
  • Remove the XML stuff and check the file path by doing something like ``print Test_file`` or ``print Test_file.readline()``. – Hew Wolff Sep 16 '13 at 18:15
  • 6
    Is the size of your XML document one hundred and eighty kilobytes or one hundred and eighty thousand kilobytes? (I'm spelling it out due to thousands separators being different in different cultures.) 180 kilobytes ought to be small enough for minidom to handle, but if you genuinely have 180 megabytes, avoid minidom, or any other XML DOM implementation for that matter, as DOM doesn't work at all well for documents of that size. Consider instead SAX or StAX. – Luke Woodward Sep 16 '13 at 18:29
  • 3
    The `for` loop won't work. An `xml.dom.minidom.Document` instance is not an iterable sequence. – mzjn Sep 16 '13 at 19:02

1 Answers1

12

Running your Python code with a few adjustments:

from xml.dom import minidom
Test_file = open('C:/test_file.xml','r')
xmldoc = minidom.parse(Test_file)

Test_file.close()

def printNode(node):
  print node
  for child in node.childNodes:
       printNode(child)

printNode(xmldoc.documentElement)

With this sample input as test_file.xml:

<a>
  <b>testing 1</b>
  <c>testing 2</c>
</a>

Yields this output:

<DOM Element: a at 0xbc56e8>
<DOM Text node "u'\n  '">
<DOM Element: b at 0xbc5788>
<DOM Text node "u'testing 1'">
<DOM Text node "u'\n  '">
<DOM Element: c at 0xbc5828>
<DOM Text node "u'testing 2'">
<DOM Text node "u'\n'">

Notes:

  • As @LukeWoodward mentioned, avoid DOM-based libraries for large inputs, however 180K should be fine. For 180M, control may never return from minidom.parse() without running out of memory first (MemoryError).
  • As @alecxe mentioned, you should eliminate the extraneous ':' in the file spec. You should have seen error output along the lines of IOError: [Errno 22] invalid mode ('r') or filename: 'C::/test_file.xml'.
  • As @mzjn mentioned, xml.dom.minidom.Document is not iterable. You should have seen error output along the lines of TypeError: iteration over non-sequence.
kjhughes
  • 106,133
  • 27
  • 181
  • 240