0

xml file would be like this:

<employee>
        <id>303</id>
        <name>varma</name>
        <age>20</age>
        <salary>120000</salary>
        <division>3</division>
    </employee>
    <employee>
        <id>304</id>
        <name>Cyril</name>
        <age>20</age>
        <salary>900000</salary>
        <division>3</division>
    </employee>
    <employee>
        <id>305</id>
        <name>Yojith</name>
        <age>20</age>
        <salary>900000</salary>
        <division>3</division>
    </employee>
</employees>

wanted outputting csv or tabular format without using any libraries

I have tried using libraries but I'm unable to do it without any libraries, have an idea about doing it: 1. convert xml to dictionary 2. convert dictionary into csv

  • Hi. Why not use libraries? Are the standard libraries okay? If “no library” means no libraries, your faced with writing your own XML parser and CSV encoder. – Zach Young Jun 03 '23 at 16:07

2 Answers2

3

I would recommend pandasread_xml() and to_csv() function, 3-liner:

Compare the documentation: to_csv, read_xml

import pandas as pd

df = pd.read_xml('employee.xml')
df.to_csv('out.csv', index=False)

Output -> (CSV-file):

id,name,age,salary,division
303,varma,20,120000,3
304,Cyril,20,900000,3
305,Yojith,20,900000,3
Hermann12
  • 1,709
  • 2
  • 5
  • 14
2

I recommend just using libraries because they're usually very optimised. I'll talk about that later. For now, here's a way that utilises the xml.dom.minidom module, which is a part of the Python standard library, so no additional libraries are required.

Edit: rewrote the last part using the standard CSV library instead of manually writing the file, as suggested by a comment. That makes for 2 Python built-in modules, not 1. The original code for the CSV writing will be at the end of the reply, if you're interested.

from xml.dom import minidom
from csv import DictWriter

# Step 1: Read and parse the XML file
# Write it as a string, or open the file and read it
xml_file = open('employees.xml', 'r')
xml_data = xml_file.read()

dom = minidom.parseString(xml_data)
employees = dom.getElementsByTagName('employee')

xml_file.close()

# Step 2: Extract the required information
data = []
for employee in employees:
    emp_data = {}
    for child in employee.childNodes:
        if child.nodeType == minidom.Node.ELEMENT_NODE:
            emp_data[child.tagName] = child.firstChild.data
    data.append(emp_data)

# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w', newline = '') as csv_file:
    fieldnames = ['id', 'name', 'age', 'salary', 'division']
    writer = DictWriter(csv_file, fieldnames = fieldnames)

    writer.writeheader()
    for emp_data in data:
        writer.writerow(emp_data)


Don't reinvent the wheel, just realign it.

— Anthony J. D'Angelo, I think

I recommend NOT using this code. You should really just use lxml. It's extremely simple and easy to use and can handle complex XML structures with nested elements and attributes. Let me know how everything goes!


Original CSV write code without CSV library
# Step 3: Write the extracted information to a CSV file
with open('output.csv', 'w') as f:
    f.write('id,name,age,salary,division\n')
    for emp_data in data:
        f.write(f"{emp_data['id']},{emp_data['name']},{emp_data['age']},{emp_data['salary']},{emp_data['division']}\n")
AbdelRahman
  • 242
  • 10
  • 1
    I’d add the CSV module and use its DictWriter and writerows(data) and you’re done. At the very least you’ll get proper quoting if the values in the XML have commas or quotes themselves, you’ll also get proper quoting of line breaks if they happen to exist. – Zach Young Jun 03 '23 at 16:04
  • 1
    I tried keeping the amount of imports as less as possible but to be honest, it doesn't really matter if I already have one. CSV is part of the standard library anyway. You're right. I'll change my code when I'm on PC. – AbdelRahman Jun 03 '23 at 19:29
  • 1
    @ZachYoung Done. Thank you so much for the suggestion. – AbdelRahman Jun 03 '23 at 21:26