0

I have 100 xml-files and I want parse them in python

Example of files:

<?xml version="1.0" <?xml version="1.0" encoding="UTF-8"?>
<api version="3.2">
<peoplecount>
<entry id="1" name="" userid="">
<count datetime="2022.07.11 14:16:20"  realin="0" realout="1" realpass="0" queuetime="0" />
<count datetime="2022.07.11 14:21:57"  realin="0" realout="1" realpass="0" queuetime="0" />
<count datetime="2022.07.11 14:23:11"  realin="0" realout="1" realpass="0" queuetime="0" />
%skipzero[1,2]%</entry>
</peoplecount>
<timezone name="Europe/Moscow" offset="10800"/>
</api>

I use xml.etree.ElementTree to parse xml-file

import xml.etree.ElementTree as ET

tree = ET.parse('pathname.xml')
root = tree.getroot()

for child in root.iter('count'):
    print(child.tag, child.attrib)

I have the error below about bad format

I know the reason for the error (In the beginning, you need to delete '<?xml version="1.0" ') , but since there are more than 100 files, I will get tired of opening each one and deleting it

Is there any idea how to remove it before parsing? Using ET.fromstring has the same issue ParseError: not well-formed (invalid token): line 1, column 0

user_Dima
  • 47
  • 4
  • 1
    Why not just have the script itself make copies of the files, but with the first broken tag removed? The text ` – Random Davis Mar 03 '23 at 20:14
  • Or better yet, why not just use [fromString](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.fromstring)? You load the file into a string, remove the broken tag, and then parse it right there, no need to make copies of any files. – Random Davis Mar 03 '23 at 20:18

0 Answers0