0

I have created the following function which converts an XML File to a DataFrame. This function works good for files smaller than 1 GB, for anything greater than that the RAM(13GB Google Colab RAM) crashes. Same happens if I try it locally on Jupyter Notebook (4GB Laptop RAM). Is there a way to optimize the code?

Code

#Libraries
import pandas as pd
import xml.etree.cElementTree as ET

#Function to convert XML file to Pandas Dataframe    
def xml2df(file_path):

  #Parsing XML File and obtaining root
  tree = ET.parse(file_path)
  root = tree.getroot()

  dict_list = []

  for _, elem in ET.iterparse(file_path, events=("end",)):
      if elem.tag == "row":
        dict_list.append(elem.attrib)      # PARSE ALL ATTRIBUTES
        elem.clear()

  df = pd.DataFrame(dict_list)
  return df

Part of an XML File ('Badges.xml')

<badges>
  <row Id="82946" UserId="3718" Name="Teacher" Date="2008-09-15T08:55:03.923" Class="3" TagBased="False" />
  <row Id="82947" UserId="994" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
  <row Id="82949" UserId="3893" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
  <row Id="82950" UserId="4591" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
  <row Id="82951" UserId="5196" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
  <row Id="82952" UserId="2635" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
  <row Id="82953" UserId="1113" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
Subhawna
  • 21
  • 5
  • 1
    This is nearly same [exact post](https://stackoverflow.com/q/62578671/1422451) I answered a few months ago. Please advise if you are same account holder or in same class/workplace as poster. – Parfait Aug 06 '20 at 15:28
  • How are you calling this method? You only define it here. Are you iterating through multiple large XML files? – Parfait Aug 06 '20 at 15:34
  • @Parfait We are working together on the same project. We are from the same college. My classmate had asked this question on SO but the answers did not solve the query, so I asked it from my account. He had also asked n datascience stackexchange but did not get any response. – Subhawna Aug 07 '20 at 04:25
  • I call this method only with one file at a time. If the size of file is less it works other wise the RAM gets crashed. – Subhawna Aug 07 '20 at 04:28
  • Looks like https://stackoverflow.com/questions/63265336/ram-crashed-for-xml-to-dataframe-conversion-function isnt it? – balderman Aug 07 '20 at 20:49

0 Answers0