0

Can anyone offer some help with regards to using Python to extract information from a XML file? This will be my example XML.

<root>
    <number index="2">
        <info>
            <info.RANDOM>Random Text</info.RANDOM>
        </info>
</root>

What I want to print out is the information between the root tags. However, I want it to print it as is, which means all the tags, text in between the tags, and the content within the tag (in this case number index ="2") I have tried itertext(), but that removes the tags and prints only the text in between the root tags. So far, I have a makeshift solution that prints out only the element.tag and the element.text but that does not print out the end tags and the content within the tag. Any help would be appreciated! :)

Alex Hua
  • 13
  • 1
  • Have you reviewed previous posts on parsing XML data? The following may be helpful http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python – Jon May 15 '17 at 16:12

2 Answers2

1

With s as your input,

s='''<root>
      <number index="2">
        <info>
            <info.RANDOM>Random Text</info.RANDOM>
        </info>
        </number>
</root>'''

Find all tags with tag name number and convert the tag to string using ET.tostring()

import xml.etree.ElementTree as ET
root = ET.fromstring(s)
for node in root.findall('.//number'):
  print ET.tostring(node)

Output:

<number index="2">
        <info>
            <info.RANDOM>Random Text</info.RANDOM>
        </info>
        </number>
Keerthana Prabhakaran
  • 3,766
  • 1
  • 13
  • 23
  • Thank you so much! It works just how I wanted it to work. However, I just had to add the "ET.tostring(node, encoding="unicode") to get it to not display all the "\n". :) – Alex Hua May 15 '17 at 18:31
0
from bs4 import BeautifulSoup

xml = "<root><number index=\"2\"><info><info.RANDOM>Random Text</info.RANDOM></info></root>"
soup = BeautifulSoup(xml, "xml")

output = soup.prettify()
print(output[output.find("<root>") + 7:output.rfind("</root>")])    

the + 7 accounts for root>\n

Sank Finatra
  • 334
  • 2
  • 10