I'm a beginner to Python and using dataframes thorugh Pandas. I'm trying to extract a table from an XML file using xml.dom.minidom
into an Excel file. This is what the original table should look like (notice the black entry under 'Bike'):
VEHICLE BRAND
Car Mercedes
Bike Kawasaki
Ducati
Truck Ram
I am trying to extract this table from the given XML file:
<Info_Collection>
<Info car="Car">
<V_Collection>
<Brand type="Mercedes"/>
</V_Collection>
</Info>
<Info car="Bike">
<V_Collection>
<Brand type="Kawasaki"/>
<Brand type="Ducati"/>
</V_Collection>
</Info>
<Info car="Truck">
<V_Collection>
<Brand type="Ram"/>
</V_Collection>
</Info>
</Info_Collection>
This is the code that I am using:
def main():
x1=[]
x2=[]
doc = xml.dom.minidom.parse('xml_file')
t1 = doc.getElementsByTagName("Info")
t2 = doc.getElementsByTagName("Brand")
for a in t1:
x1.append(tb.getAttribute("car"))
for a in t2:
x2.append(tb.getAttribute("type"))
while len(x1) != len(x2):
x1.append("")
boDF = pd.DataFrame({'VEHICLE': x1, 'BRAND':x2})
boDF.to_excel(writer, sheet_name='Sheet1', index=0, startrow=1)
writer.save()
if __name__ == "__main__":
main()
After running it, the output table is as follows:
VEHICLE BRAND
Car Mercedes
Bike Kawasaki
Truck Ducati
Ram
Could someone kindly help me figure out how to insert a space between 'Bike' and 'Truck'? I tried to run both for loops concurrently and compared their lengths to see if they were equal or not and when they are not, a blank space would be added to the first column. However, I cannot get it to work. I know that the while loop in my code adds a space to the end of the first column, but I cannot figure out how to add anywhere inside the column.