For working (navigating, searching, and modifying) with XML or HTML data, I found BeautifulSoup library very useful. For installation problem or detailed information, click on link.
To find Attribute (tag) or multi-attribute values:
from bs4 import BeautifulSoup
data = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
<pdf2xml producer="poppler" version="0.48.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
<text top="246" left="135" width="178" height="16" font="1">PALS SOCIETY OF
CANADA</text>
<text top="261" width="86" height="16" font="1">13479 77 AVE</text>
</page>
</pdf2xml>"""
soup = BeautifulSoup(data, features="xml")
page_tag = soup.find_all('page')
for each_page in page_tag:
text_tag = each_page.find_all('text')
for text_data in text_tag:
print("Text : ", text_data.text)
print("Left attribute : ", text_data.get("left"))
Output:
Text : PALS SOCIETY OF CANADA
Left tag : 135
Text : 13479 77 AVE
Left tag : None