I started learning Python a few days ago in order to build a basic site in order to compile some statistics from BOINC projects eg SETI@home etc.
Basically the site does:
- Download gz files
- Uncompress gz files into xml files
- Build xml info into data structures
- Write data structures back into cvs files
In total there are 34 .gz files from 34 different BOINC projects.
All the code is now finished and works, however the .gz file from one project refuses to parse, whereas the other 34 work fine.
The file is:
user.gz
from
http://www.rnaworld.de/rnaworld/stats/
These are the errors that I am getting:
Traceback (most recent call last):
File "C:/Users/chris/PycharmProjects/testproject1/rnaw100.py", line 77, in <module>
for event, elem in ET.iterparse(str(x_file_name2), events=("start", "end")):
File "C:\Users\chris\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1227, in iterator
yield from pullparser.read_events()
File "C:\Users\chris\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1302, in read_events
raise event
File "C:\Users\chris\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1274, in feed
self._parser.feed(data)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0
This is the code that downloads the .gz file and parse's the XML: (I have left out var declarations etc)
As a newbie I am finding it difficult to understand what is wrong, as (a) the errors refers to a Python core file eg ElementTree.py, and (b) I can't understand why a .gz file which many other BOINC stat sites are using wont work here, and (c) why my code works on 34 files, but not this 1.
response = requests.get(url2, stream=True)
if response.status_code == 200:
with open(target_path2, 'wb') as f:
f.write(response.raw.read())
with gzip.open(target_path2, 'rb') as f_in:
with open(x_file_name2, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
for event, elem in ET.iterparse(str(x_file_name2), events=("start", "end")):
if elem.tag == "total_credit" and event == "end":
tc=float(elem.text)
elem.clear
if elem.tag == "expavg_credit" and event == "end":
ac=float(elem.text)
elem.clear
if elem.tag == "id" and event == "end":
id=elem.text
elem.clear
if elem.tag == "cpid" and event == "end":
cpid=elem.text
elem.clear
if elem.tag == "name" and event == "end":
name = elem.text
elem.clear()
teamid=TEAMID
if elem.tag == "teamid" and event == "end":
if elem.text == TEAMID:
cnt=cnt+1
dic[id]={"Name":name,"CPID":cpid, "TC":tc, "AC":ac}
elem.clear()