How to print only certain xml elements

Question

I want to output a list of the main elements of a page. The summary is printed below. I need a way to only grab the text between the text tags using python. If successful I want the output of the content below to be:

Mathematics, Differential equation, Geometry

<language>english</language>
        <concepts>
            <concept>
                <text>Mathematics</text>
                <relevance>0.988094</relevance>
                <dbpedia>http://dbpedia.org/resource/Mathematics</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.04rjg</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvVjHd5wpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>
                <text>Differential equation</text>
                <relevance>0.729187</relevance>
                <dbpedia>http://dbpedia.org/resource/Differential_equation</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.050fdl</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvXXRFJwpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>
                <text>Geometry</text>
                <relevance>0.677052</relevance>
                <dbpedia>http://dbpedia.org/resource/Geometry</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.025x7g_</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvgcAf5wpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>

I haven't used anything yet. This is the response from the Alchemi API. I requested a URL using requests.get. The code above is a summarised version of the response. I was thinking of maybe a regex match or something? — user1100121, Oct 28 '15 at 20:47
There are lots of XML parsers in Python, no need to reinvent the wheel with a bunch of regexs. See http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python for some ideas. My recommendation would be `minidom` since it sounds like you don't need anything too complicated. — user812786, Oct 28 '15 at 20:53
yeah, `xpath` is your friend here, literally going to be something like `/concept/text()` — Shawn Mehan, Oct 28 '15 at 20:58

score 0 · Accepted Answer · answered Oct 28 '15 at 21:54

0

You should look at some xml parsers. They're readily available. For example:

from xml.etree import ElementTree

doc = ElementTree.fromstring(xmlstring)
for tag in doc.findall('.//text'):
  print(tag.text)

answered Oct 28 '15 at 21:54

Marco Tompitak

648
1
5
12

How to print only certain xml elements

1 Answers1