0

This is an XML file that has data in which I want to perform the task using lxml.objectify & pandas.DataFrame
File: students.xml

<?xml version="1.0" encoding="UTF-8"?>

<college>
    <department>
    <name>Information Technology</name>
        <semester>
            <sem_3>
                <student_no>1</student_no>
                <student_name>Ravindra</student_name>
                <student_city>Ahmedabad</student_city>
            </sem_3>
        </semester>
    </department>
    <department>
    <name>Computer Engineering</name>
        <semester>
            <sem_3>
                <student_no>2</student_no>
                <student_name>Surya</student_name>
                <student_city>Gandhinagar</student_city>
            </sem_3>
        </semester>
    </department>
</college>

I tried this and could only get this output.

import pandas as pd
from lxml import objectify
from pandas import DataFrame
xml = objectify.parse(open('students.xml'))
root = xml.getroot()
number = []
name = []
city = []
for i in range(0, 2):
  obj = root.getchildren()[i].getchildren()
  for j in range(0, 1):
    child_obj = obj[1].getchildren()[j].getchildren()
    number.append(child_obj[0])
    name.append(child_obj[1])
    city.append(child_obj[2])
df = pd.DataFrame(list(zip(number, name, city)), columns =['student_no', 'student_name', 'student_city'])
print(df)
-----------------------------------------------
  student_no    student_name       student_city
0    [[[1]]]  [[[Ravindra]]]    [[[Ahmedabad]]]
1    [[[2]]]     [[[Surya]]]  [[[Gandhinagar]]]
-----------------------------------------------

I'm not able to get output like this...

-----------------------------------------------
  student_no    student_name       student_city
0          1        Ravindra          Ahmedabad
1          2           Surya        Gandhinagar
-----------------------------------------------

Can you help me with this?

1 Answers1

2

You were appending lxml objects to your list

import pandas as pd
from lxml import objectify
from pandas import DataFrame
with open('students.xml') as f:
    xml = objectify.parse(f)
root = xml.getroot()
number = []
name = []
city = []
for i in range(0, 2):
    obj = root.getchildren()[i].getchildren()
    for j in range(0, 1):
        child_obj = obj[1].getchildren()[j].getchildren()
        number.append(int(child_obj[0].text))
        name.append(child_obj[1].text)
        city.append(child_obj[2].text)
data = {"student_no": number, 'student_name': name, 'student_city': city}         
df = pd.DataFrame(data)
print(df)

outputs:

  student_no student_name student_city
0          1     Ravindra    Ahmedabad
1          2        Surya  Gandhinagar
woblob
  • 1,349
  • 9
  • 13
  • 1
    To add to this, You can check the data type of any child_obj for e.g `type(child_obj[0])` , it returns ``, also another solution would be to just typecast before you append, in this case `int(child_obj[0])`, `str(child_obj[1])` & `str(child_obj[2])` – Uchiha012 Sep 29 '20 at 07:14