Problems grabbing data with xpath and lxml in Python

Question

I'm trying to grab the contents of BC_1YEAR and NEW_DATE for the last entry in this data.

This is my Python code for Google App Engine:

import lxml.etree
from google.appengine.api import urlfetch

def foo():
    url = 'http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData$filter=month(NEW_DATE)%20eq%201%20and%20year(NEW_DATE)%20eq%202015'
    response = urlfetch.fetch(url)
    tree = lxml.etree.fromstring(response.content)
    nsmap = {'atom': 'http://www.w3.org/2005/Atom',
             'd': 'http://schemas.microsoft.com/ado/2007/08/dataservices'}
    myData = tree.xpath("//atom:entry[last()]/d:BC_1YEAR", namespaces=nsmap)

But myData is an empty list while it should have been 0.2 from today's data. I've been trying for hours to get this working so any help would be greatly appreciated. I'm assuming NEW_DATE works similarly.

Perhaps you could also look after [your last question](http://stackoverflow.com/questions/14645199/curve-fitting-this-data-in-r) - and [**accept one of the answers**](http://stackoverflow.com/help/someone-answers) given there? — Mathias Müller, Jan 14 '15 at 17:24

score 1 · Accepted Answer · edited May 23 '17 at 11:50

As far as I can see, the correct URL is

http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015

whereas in your code there is (no ? after Data)

http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015

That's why your code currently produces the following XML:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
  <code></code>
  <message xml:lang="en-US">Resource not found for the segment 'DailyTreasuryYieldCurveRateData$filter=month'.</message>
</error>

And of course there is no atom:entry in that error message.

Further, your XPath expression:

//atom:entry[last()]/d:BC_1YEAR

will not retrieve d:BC_1YEAR because d:BC_1YEAR is not an immediate child of atom:entry. Use

//atom:entry[last()]//d:BC_1YEAR

or, even better, register the m: prefix in the code and use

//atom:entry[last()]/atom:content/m:properties/d:BC_1YEAR

import lxml.etree
from google.appengine.api import urlfetch

def foo():
    url = 'http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=month%28NEW_DATE%29%20eq%201%20and%20year%28NEW_DATE%29%20eq%202015'
    response = urlfetch.fetch(url)
    tree = lxml.etree.fromstring(response.content)
    nsmap = {'atom': 'http://www.w3.org/2005/Atom',
             'd': 'http://schemas.microsoft.com/ado/2007/08/dataservices',
             'm': 'http://schemas.microsoft.com/ado/2007/08/dataservices/metadata'}
    myData = tree.xpath("//atom:entry[last()]/atom:content/m:properties/d:BC_1YEAR", namespaces=nsmap)

EDIT: As a response to your comment:

I want my code to work 'indefinitely' with as little maintenance as possible. I don't know what the purpose of namespaces really are and I wonder if these particular namespaces are generic and can be expected to stay that way for years to come?

I have already explained the purpose of namespaces in XML elsewhere - please look at that answer, too. Namespaces are never generic, in fact, they are the exact opposite of generic - they are supposed to be unique.

That said, there are ways to ignore namespaces. An expression like

//atom:entry[last()]//d:BC_1YEAR

Can be rewritten as

//*[local-name() = 'entry'][last()]//*[local-name() = 'BC_1YEAR']

to find elements regardless of their namespace. This would be an option if you had reason to believe the namespace URIs are going to change in the future.

Thanks! I do a myData[0].text afterwards to get the text. I actually concat the URL with current month and year and I must have made a mistake when I made a string for this code snippet. I understand why my xpath was wrong now that I've seen the correct way. I have another question. I want my code to work 'indefinitely' with as little maintenance as possible. I don't know what the purpose of namespaces really are and I wonder if these particular namespaces are generic and can be expected to stay that way for years to come? — questiondude, Jan 15 '15 at 08:54

Problems grabbing data with xpath and lxml in Python

1 Answers1