0

I want to output a list of the main elements of a page. The summary is printed below. I need a way to only grab the text between the text tags using python. If successful I want the output of the content below to be:

Mathematics, Differential equation, Geometry

<language>english</language>
        <concepts>
            <concept>
                <text>Mathematics</text>
                <relevance>0.988094</relevance>
                <dbpedia>http://dbpedia.org/resource/Mathematics</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.04rjg</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvVjHd5wpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>
                <text>Differential equation</text>
                <relevance>0.729187</relevance>
                <dbpedia>http://dbpedia.org/resource/Differential_equation</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.050fdl</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvXXRFJwpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>
                <text>Geometry</text>
                <relevance>0.677052</relevance>
                <dbpedia>http://dbpedia.org/resource/Geometry</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.025x7g_</freebase>
                <opencyc>http://sw.opencyc.org/concept/Mx4rvgcAf5wpEbGdrcN5Y29ycA</opencyc>
            </concept>
            <concept>
user1100121
  • 53
  • 1
  • 2
  • 4
  • I haven't used anything yet. This is the response from the Alchemi API. I requested a URL using requests.get. The code above is a summarised version of the response. I was thinking of maybe a regex match or something? – user1100121 Oct 28 '15 at 20:47
  • There are lots of XML parsers in Python, no need to reinvent the wheel with a bunch of regexs. See http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python for some ideas. My recommendation would be `minidom` since it sounds like you don't need anything too complicated. – user812786 Oct 28 '15 at 20:53
  • yeah, `xpath` is your friend here, literally going to be something like `/concept/text()` – Shawn Mehan Oct 28 '15 at 20:58

1 Answers1

0

You should look at some xml parsers. They're readily available. For example:

from xml.etree import ElementTree

doc = ElementTree.fromstring(xmlstring)
for tag in doc.findall('.//text'):
  print(tag.text)
Marco Tompitak
  • 648
  • 1
  • 5
  • 12