I would like to use Python to read a VTU file which is XML and may raw contain binary data. The specification says:
There is one case in which the file is not a valid XML document. When the AppendedData section is not encoded as base64, raw binary data is present that may violate the XML specification. This is not default behavior, and must be explicitly enabled by the user.
For example, check dragon.vtu
:
<VTKFile type="UnstructuredGrid" version="1.0" byte_order="LittleEndian" header_type="UInt64">
<UnstructuredGrid>
<Piece NumberOfPoints="69827" NumberOfCells="139650">
<Cells>
<DataArray type="Int64" Name="connectivity" format="appended" RangeMin="" RangeMax="" offset="837932"/>
<DataArray type="Int64" Name="offsets" format="appended" RangeMin="" RangeMax="" offset="4189540"/>
<DataArray type="UInt8" Name="types" format="appended" RangeMin="" RangeMax="" offset="5306748"/>
</Cells>
</Piece>
</UnstructuredGrid>
<AppendedData encoding="raw">
_$É�����ıAdÌAÁÊÃÿ@>yAn£GÁÏAA(~AÁþ`AF¶Áo.@Ô«¬A3Ä|Ásc2@ï8±A cÁÉX@®AZ/AϱÁ:»AA)³Á(ÉAs!AFÁ\A½A*ÁyA*)AéÔÁØÓAÀ¡Aã_ÁóA`öBÌ]gADé¸AdBdÌnA|r·AhB^ºnAzºAȦ
[...]
Naively doing
import xml.etree.ElementTree as ET
parser = ET.XMLParser()
tree = ET.parse("dragon.vtu", parser)
does not work:
Traceback (most recent call last):
File "f.py", line 3, in <module>
tree = ET.parse("dragon.vtu", parser)
File "/usr/lib/python3.7/xml/etree/ElementTree.py", line 1197, in parse
tree.parse(source, parser)
File "/usr/lib/python3.7/xml/etree/ElementTree.py", line 604, in parse
parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 28, column 5
Any hints?