1

I'm fairly new to python and am hoping to get some help transforming an XML file into Pandas Dataframe. I have searched other resources but am still stuck. I'm looking to get all the fields in between tag into a table. Any help is greatly appreciated! Thank you.

Below is the code I tried but it not working properly.

import xml.etree.ElementTree as ET
import pandas as pd

xml_data = open('5249009-08-34-59-126029.xml', 'r').read()
root = ET.XML(xml_data)

data = []
cols = []
for i, child in enumerate(root):
    data.append([subchild.text for subchild in child])
    cols.append(child.tag)

df = pd.DataFrame(data).T 
df.columns = cols 

print(df)

Below is sample input data"

<?xml version="1.0"?>

-<RECORDING>

<IDENT>0</IDENT>

<DEVICEID>133242232</DEVICEID>

<DEVICEALIAS>52232009</DEVICEALIAS>

<GROUP>1823481655</GROUP>

<GATE>1011655</GATE>

<ANI>7777777777</ANI>

<DNIS>777777777</DNIS>

<USER1>00:07:53.2322691,00:03:21.34232761</USER1>

<USER2>text</USER2>

<USER3/>

<USER4/>

<USER5>34fc0a8d-d5632c9b1</USER5>

<USER6>000dfsdf98701596638094</USER6>

<USER7>97</USER7>

<USER8>00701596638094</USER8>

<USER9>10155</USER9>

<USER10/>

<USER11/>

<USER12/>

<USER13>Text</USER13>

<USER14>4</USER14>

<USER15>10</USER15>

<CALLSEGMENTID/>

<CALLID>9870</CALLID>

<FILENAME>\\folderpath\folderpath\folderpath\folderpath\2020\Aug\05\5249009\52343109-234234-34-59-1234234029</FILENAME>

<DURATION>201</DURATION>

<STARTYEAR>2020</STARTYEAR>

<STARTMONTH>08</STARTMONTH>

<STARTMONTHNAME>August</STARTMONTHNAME>

<STARTDAY>05</STARTDAY>

<STARTDAYNAME>Wednesday</STARTDAYNAME>

<STARTHOUR>08</STARTHOUR>

<STARTMINUTE>34</STARTMINUTE>

<STARTSECOND>59</STARTSECOND>

<PRIORITY>50</PRIORITY>

<RECORDINGTYPE>S</RECORDINGTYPE>

<CALLDIRECTION>I</CALLDIRECTION>

<SCREENCAPTURE>7</SCREENCAPTURE>

<KEEPCALLFORDAYS>90</KEEPCALLFORDAYS>

<BLACKOUTREMOTEAUDIO>false</BLACKOUTREMOTEAUDIO>

<BLACKOUTS/>

</RECORDING>
JK34JK34
  • 41
  • 6
  • https://stackoverflow.com/a/59074604/6366770 – David Erickson Apr 09 '21 at 19:18
  • 2
    Does this answer your question? [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) – Guillaume Ansanay-Alex Apr 09 '21 at 19:22
  • David Erickson - Thank you for providing this link, however, it seems his data is set up differently and has multiple rows. Mine is only 1 row with multiple columns – JK34JK34 Apr 09 '21 at 19:26

1 Answers1

1

One possible solution how to parse the file:

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("your_file.xml", "r"), "xml")

d = {}
for tag in soup.RECORDING.find_all(recursive=False):
    d[tag.name] = tag.get_text(strip=True)

df = pd.DataFrame([d])
print(df)

Prints:

  IDENT   DEVICEID DEVICEALIAS       GROUP     GATE         ANI       DNIS                               USER1 USER2 USER3 USER4               USER5                   USER6 USER7           USER8  USER9 USER10 USER11 USER12 USER13 USER14 USER15 CALLSEGMENTID CALLID                                           FILENAME DURATION STARTYEAR STARTMONTH STARTMONTHNAME STARTDAY STARTDAYNAME STARTHOUR STARTMINUTE STARTSECOND PRIORITY RECORDINGTYPE CALLDIRECTION SCREENCAPTURE KEEPCALLFORDAYS BLACKOUTREMOTEAUDIO BLACKOUTS
0     0  133242232    52232009  1823481655  1011655  7777777777  777777777  00:07:53.2322691,00:03:21.34232761  text              34fc0a8d-d5632c9b1  000dfsdf98701596638094    97  00701596638094  10155                        Text      4     10                 9870  \\folderpath\folderpath\folderpath\folderpath\...      201      2020         08         August       05    Wednesday        08          34          59       50             S             I             7              90               false          
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • @adrej Kesely - how would I modify the code to loop through multiple XML files in the same directory? – JK34JK34 Apr 09 '21 at 20:35
  • @JK34JK34 Try examples in this answer: https://stackoverflow.com/questions/18262293/how-to-open-every-file-in-a-folder You can then append the dictionaries to list and use the list in `df = pd.DataFrame(lst)` – Andrej Kesely Apr 09 '21 at 20:37