1

I have an xml file and I am trying to iterate through the tags to convert it to a pandas dataframe. My current process is to open the XML file with excel as an "XML table" but this takes forever. Trying to find a similar process in Python.

I am trying to follow along with the code presented on numerous other Stack Overflow questions and articles such as here here and here

I believe there are 2 problems I am facing:

  1. Does having the namespace affect my xml?

  2. I don't want to specify all of my tags as seen as a solution in 19.7.1.6. of the Element Tree documentation. I just want all of my tags to appear as a column for each "Security." If it doesn't have that tag it should be null. I also do not want to do a nasty if-else.

The problem is that when I run the code:

import xml.etree.ElementTree as et

etree = et.parse(xml_path)
test = etree.getroot()

and try and iterate as suggested in the above links, I am not able to easily access the child nodes.

Sample File:

<?xml version="1.0"?>
<SecurityInformation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://tempuri.org/SecurityInformation.xsd">
    <Security>
        <Country>United States</Country>
    </Security>
</SecurityInformation> 
mzjn
  • 48,958
  • 13
  • 128
  • 248
rcwilkin1993
  • 87
  • 1
  • 9
  • 2
    why didn't you update your original question: https://stackoverflow.com/questions/61732011/how-to-read-xml-file-into-pandas-dataframe ? You did find the delete link, in that same row there is also an [edit] link. Please don't delete and re-post questions. That is frowned upon here and might cause you troubles down the road. – rene May 11 '20 at 16:54
  • @rene i originally had edited the question but once I had edited the question was totally different from what I had originally asked. Is the better behavior to just leave the old (different) question and post a new one? It just seemed more logical to post a new question. – rcwilkin1993 May 11 '20 at 17:07
  • @rene thanks. You don't see them as different now because I'd already changed it. What would you suggest is the best route to get my question answered at this point? – rcwilkin1993 May 11 '20 at 17:16
  • Practice patience ... – rene May 11 '20 at 17:21
  • Oh, and are you sure there are namespace attributes on that closing `` tag? It would be the first time I encounter those. – rene May 11 '20 at 17:23
  • @rene there may not be. The file is massive and it would take me far to long to scroll to the bottom so I had made an assumption that is how the tag ended – rcwilkin1993 May 11 '20 at 17:45
  • @mzjn sorry, trying to help show what I've tried but that might be causing confusion. Can you just focus on the main point of the question? "How to read XML file (like this) into Pandas df like read XML table in Excel" – rcwilkin1993 May 11 '20 at 18:03
  • Does this answer your question? [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) – iacob Mar 25 '21 at 08:25

1 Answers1

3

I've made a package for similar use case. It could work here too.

pip install pandas_read_xml

you can do something like

import pandas_read_xml as pdx

df = pdx.read_xml('filename.xml', ['SecurityInformation'])

To flatten, you could

df = pdx.flatten(df)

or

df = pdx.fully_flatten(df)
min
  • 231
  • 2
  • 6