0

I am new to python,I am trying to parse a xml document to count the total no. of words,I tried the below program to count the n no. of words in the file,But i get the error as follows:

After getting this error,i installed "utils",but still it comes. Is there any other easy way of getting the totla no. of words of an xml document in python,Please help!

Traceback (most recent call last):
  File "C:\Python27\xmlp.py", line 1, in <module>
    from xml.dom import utils,core
ImportError: cannot import name utils

Coding

from xml.dom import utils,core
import string
reader = utils.FileReader('Greeting.xml')

doc = reader.document

Storage = ""

for n in doc.documentElement.childNodes:
 if n.nodeType == core.TEXT_NODE:
  # Accumulate contents of text nodes
  Storage = Storage + n.nodeValue
print len(string.split(Storage))
adsa lila
  • 115
  • 3
  • 11

2 Answers2

2

You'll find it easier to use ElementTree, eg:

from xml.etree import ElementTree as ET

xml = '<a>one two three<b>four five<c>Six Seven</c></b></a>'
tree = ET.fromstring(xml)
total = sum(len(text.split()) for text in tree.itertext())
# 7

But use tree = ET.parse('Greeting.xml') to load your real data.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • i tried ur code,my xml file is around 12 mb,but errors as `Traceback File "C:\Python27\xmlp.py", line 3, in tree = ET.parse('myxml.xml') File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse tree.parse(source, parser) File "C:\Python27\lib\xml\etree\ElementTree.py", line 656, in parse parser.feed(data) File "C:\Python27\lib\xml\etree\ElementTree.py", line 1642, in feed self._raiseerror(v) File "C:\Python27\lib\xml\etree\ElementTree.py", line 1506, in _raiseerror raise err ParseError: undefined entity ΓΆ: line 47, column 18 >>> ` – adsa lila Feb 27 '15 at 12:35
  • @adsalila right... so it's not well formed XML then... Take a look at http://stackoverflow.com/questions/7237466/python-elementtree-support-for-parsing-unknown-xml-entities as to how to work around it if you need to – Jon Clements Feb 27 '15 at 12:37
  • But,if i try ur code to a small xml file i get error as `File "C:\Python27\xmlp.py", line 4, in total = sum(len(text.split()) for text in tree.itertext()) AttributeError: 'ElementTree' object has no attribute 'itertext'` – adsa lila Feb 27 '15 at 13:05
0

imho you do not need utils and core just from xml.dom import minidom

look a similar example here: Python XML File Open

Community
  • 1
  • 1
frankegoesdown
  • 1,898
  • 1
  • 20
  • 38