How to iterate over a directory of XML files and extract data in python

Question

I need to read xml file and fetch data to a dataframe. I have developed this to extract data for one xml file.

import pandas as pd
import numpy as np
import xml.etree.cElementTree as et
import datetime

tree=et.parse('/data/dump_xml/1013.xml')
root=tree.getroot()

NAME = []
for name in root.iter('name'):
    NAME.append(name.text)
print(NAME[0])
print(NAME[1])


UPDATE = []
for update in root.iter('lastupdate'):
    UPDATE.append(update.text)
updated = datetime.datetime.fromtimestamp(int(UPDATE[0]))

lastupdate=updated.strftime('%Y-%m-%d %H:%M:%S')

ParaValue = []
for parameterevalue in root.iter('value'):
    ParaValue.append(parameterevalue.text)
print(ParaValue[0])
print(ParaValue[1])

print(lastupdate,NAME[0],ParaValue[0])
print(lastupdate,NAME[1],ParaValue[1])

For one each file I need to get below two results as an output..

2022-05-23 11:25:01  in   1.5012356187e+05 
2022-05-23 11:25:01  out   1.7723777592e+05

Now I need to do this to all my xml files in /data/dump_xml/ and make a df with all the data at one execution. Can someone help me to improve my code?

Encapsulate your XML processing in a function and call this function for each file contained in the directory. As this is a multi-step approach, there might be several sub-questions related to your question. Where are you stuck exactly? — albert, Jun 22 '22 at 08:42
See https://stackoverflow.com/a/10378012/3991125 for a brief overview of possibilities to iterate over a directory. — albert, Jun 22 '22 at 08:45

How to iterate over a directory of XML files and extract data in python

0 Answers0