0

I am fairly new to python and am looking for some guidance on my python code. I have multiple XML files that I would like to put into ONE data frame. Each XML file is one record. My code below puts the data into multiple data frames but am trying to find a solution to append the rows to one data frame. Any help is greatly appreciated!

Here is my existing code

from bs4 import BeautifulSoup
import lxml
import pandas as pd 
import os    
import xml.etree.ElementTree as et 

path = '/app/notebooks'

for filename in os.listdir(path):
    if filename.endswith('.xml'):
        fullname = os.path.join(path, filename)
        soup = BeautifulSoup(open(fullname, "r"), "xml")
        d = {}
        for tag in soup.RECORDING.find_all(recursive=False):
            d[tag.name] = tag.get_text(strip=True)
        df = pd.DataFrame([d])
        pd.set_option('display.max_columns', None)
        display (pd.DataFrame(df))

Current output: 2 data frames but I'm trying to output 1 data frame with 2 records enter image description here

JK34JK34
  • 41
  • 6
  • 2
    Either create an empty dataframe then append to it in a loop, or append the individual dataframes to a list and then merge them after the loop completes – G. Anderson Apr 12 '21 at 18:35
  • Thank you! That answered my question. – JK34JK34 Apr 12 '21 at 19:16
  • 1
    In that case I'll suggest this be closed as a duplicate of [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101), glad I could help! – G. Anderson Apr 12 '21 at 20:42
  • 2
    Do note: in forthcoming Pandas v1.3 (to release May 31, 2021) has a new method: [`read_xml`](https://pandas.pydata.org/pandas-docs/dev/user_guide/io.html#io-read-xml). If XML is fairly flat, it can easily parse content to data frame. If not, use stylesheet via `lxml` to run XSLT to flatten for import. No loops needed! – Parfait Apr 12 '21 at 21:54

0 Answers0