How to parse xml using python

Question

I have the below xml file :

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:cq="http://www.day.com/jcr/cq/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
    cq:lastReplicated="{Date}2016-03-02T15:23:40.679-05:00"
    cq:lastReplicatedBy="XXXXt"
    cq:lastReplicationAction="Activate"
    jcr:description="Procedure"
    jcr:mixinTypes="[cq:ReplicationStatus]"
    jcr:primaryType="cq:Tag"
    jcr:title="Lung Volume Reduction Surgery"
    sling:resourceType="cq/tagging/components/tag"/>

I am trying to parse the XML file using ElementTree but I am not able to extract "Lung Volume Reduction Surgery" which is under the tag jcr:title .

I have already tried with BeatifulSoup , Regex and ElementTree but unable to do it

Below is the code that I used for Element Tree :

import xml.etree.ElementTree as ET
xml="Actual xml document"
xml.find('./root').attrib['title']

I am a beginner in XML parsing .. and spent more than 3 hours now on this XML file but unable to parse the value of jcr:title Any help will be greatly appreciated

You need to use the namespace – Padraic Cunningham May 31 '16 at 21:28 — Padraic Cunningham, May 31 '16 at 21:28

score 1 · Answer 1 · answered May 31 '16 at 21:27

1

Here is one way, using xml.etree.ElementTree

from xml.etree import ElementTree as ET

tree = ET.parse('input.xml')
root = tree.getroot()

jcr_namespace = "http://www.jcp.org/jcr/1.0"

print root.attrib[ET.QName(jcr_namespace, 'title')]

answered May 31 '16 at 21:27

Robᵩ

163,533
20
239
308

How to parse xml using python

1 Answers1