I want to turn the following data into a dataframe, de xml file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<file>
<SORT_INFO>
<sort_type>sort order</sort_type>
</SORT_INFO>
<ALL_INSTANCES>
<instance>
<ID>1</ID>
<start>11.24</start>
<end>31.24</end>
<code>19. Luis Alehandru Paz Mulato</code>
<label>
<group>Team</group>
<text>America de Cali</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>52.6</pos_x>
<pos_y>34.2</pos_y>
</instance>
<instance>
<ID>2</ID>
<start>11.24</start>
<end>31.24</end>
<code>19. Luis Alehandru Paz Mulato</code>
<label>
<group>Team</group>
<text>America de Cali</text>
</label>
<label>
<group>Action</group>
<text>Positional attacks</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>52.6</pos_x>
<pos_y>34.2</pos_y>
</instance>
<instance>
<ID>3</ID>
<start>14.43</start>
<end>34.43</end>
<code>6. Felipe Jaramillo Velasquez</code>
<label>
<group>Team</group>
<text>America de Cali</text>
</label>
<label>
<group>Action</group>
<text>Positional attacks</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>38.6</pos_x>
<pos_y>26.2</pos_y>
</instance>
</ALL_INSTANCES>
After </ALL_INSTANCES> there is a little bit more, but I only need the data inside the 'instance' tags as if those were the rows. So my columns would be as follows:
['ID', 'start', 'end', 'code', 'group1', 'text1', 'group2', 'text2', 'group3', 'text3', 'pos_x','pos_y']
and the first row would be:
['1', '11.24', '31.24', '19. Luis Alehandru Paz Mulato', 'Team', 'America de Cali', 'Action', 'Passes accurate', 'Half', '1st half', '52.6','34.2']
It is my firts time working with xml files, thank you so much for your help.
I already tried the following:
import xml.etree.ElementTree as et
xtree = et.parse("players_vs_nacional_0-3.xml")
xroot = xtree.getroot()
for node in xroot:
id_action = node.find("ID").text
start = node.find("start").text
end = node.find("end").text
code = node.find("code").text
posx = node.find("pos_x").text
posy = node.find("pos_y").text
But I got the error:
AttributeError: 'NoneType' object has no attribute 'text'
I believe it is because the first nodes don't contain what I was looking for ("ID", "start", "end", etc), but I couldn't get passed that and I am also having trouble trying to figure out how to get the info out of the tags because there are three of them for every and I need them all and in the same row.