I am trying to use lxml.etree.parse
and tree.xpath
in django to parse some content from an external rss feed. But for some reason, I am unable to get any results. I've been able to use the below method before with success on other xml files but seem to be having difficultis with this one.
Here is what the xml file looks like that I am trying to scrape from:
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Open Library : Author Name</title>
<link href="http://www.somedomain.org/people/atom/author_name" rel="self"/>
<updated>2012-03-20T16:41:00Z</updated>
<author>
<name>somedomain.org</name>
</author>
<id>tag:somedomain.org,2007:/person_feed/123456</id>
<entry>
<link href="http://www.somedomain.org/roll_call/show/1234" rel="alternate"/>
<id>
tag:somedomain.org,2012-03-20:/roll_call_vote/1234
</id>
<updated>2012-03-20T16:41:00Z</updated>
<title>Once upon a time</title>
<content type="html">
This os a book full of words
</content>
</entry>
</feed>
Here is what my view in django looks like:
def openauthors(request):
tree = lxml.etree.parse("http://www.somedomain.org/people/atom/author_name")
listings = tree.xpath("//author")
listings_info = []
for listing in listings:
this_value = {
"name":listing.findtext("name"),
}
listings_info.append(this_value)
json_listings = '{"listings":' + simplejson.dumps(listings_info) + '}'
if("callback" in request.GET.keys()):
callback = request.GET["callback"]
else:
callback = None
if(callback):
response = HttpResponse("%s(%s)" % (
callback,
simplejson.dumps(listings_info)
), mimetype="application/json"
)
else:
response = HttpResponse(json_listings, mimetype="application/json")
return response
I have also tried some of the following paths in hopes that they might help but have had no success.
listings = tree.xpath("feed/author")
listings = tree.xpath("/feed/author")
listings = tree.xpath("/author")
listings = tree.xpath("author")
Any help in the right direction would be appreciated.