I am trying to parse an XML file using the ElementTree
Python package. However, I am getting an OverflowError
, perhaps because of the XML is large (~2GB). I did see that this question was asked before. The answers mostly suggest that perhaps a 32-bit Python is used. But mine is certainly 64-bit, as it also reports when I start up my interpreter.
Here are the lines of code I run:
import xml.etree.ElementTree as ET
xmlPath = os.path.join(codePath, 'data', 'nonmeta', name + '.xml')
xml_data = open(xmlPath, encoding="utf8").read()
root = ET.XML(xml_data)
The last row produces the following exception:
File "C:\ProgramData\Anaconda3\lib\xml\etree\ElementTree.py", line 1314, in XML parser.feed(text)
OverflowError: size does not fit in an int
I am working in Windows 7 on Python 3.6.0, using an IPython 5.1.0 as an interpreter. Any suggestions on what can be done to parse the data without getting this error?
EDIT: why I believe that it's not a 32-bit issue The operating system is windows 7. When viewing the computer properties, it says:
System type: 64-bit Operating System
And when I start up my Ipython interpreter , using Spyder, the start-up message says:
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.