-1

I've looked through a number of support pages, examples and documents however I am still stumped as to how I can achieve what I am after using python.

I need to process/parse an xml feed and just take very specific values from the XML document. Which is where I am stumped.

The xml looks like the following:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed>
 <title type="text">DailyTreasuryYieldCurveRateData</title>
 <id></id>
 <updated>2014-12-03T07:44:30Z</updated>
 <link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
 <entry>
 <id></id>
<title type="text"></title>
<updated>2014-12-03T07:44:30Z</updated>
<author>
  <name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6235)" />
<category />
<content type="application/xml">
  <m:properties>
    <d:Id m:type="Edm.Int32">6235</d:Id>
    <d:NEW_DATE m:type="Edm.DateTime">2014-12-01T00:00:00</d:NEW_DATE>
    <d:BC_1MONTH m:type="Edm.Double">0.01</d:BC_1MONTH>
    <d:BC_3MONTH m:type="Edm.Double">0.03</d:BC_3MONTH>
    <d:BC_6MONTH m:type="Edm.Double">0.08</d:BC_6MONTH>
    <d:BC_1YEAR m:type="Edm.Double">0.13</d:BC_1YEAR>
    <d:BC_2YEAR m:type="Edm.Double">0.49</d:BC_2YEAR>
    <d:BC_3YEAR m:type="Edm.Double">0.9</d:BC_3YEAR>
    <d:BC_5YEAR m:type="Edm.Double">1.52</d:BC_5YEAR>
    <d:BC_7YEAR m:type="Edm.Double">1.93</d:BC_7YEAR>
    <d:BC_10YEAR m:type="Edm.Double">2.22</d:BC_10YEAR>
    <d:BC_20YEAR m:type="Edm.Double">2.66</d:BC_20YEAR>
    <d:BC_30YEAR m:type="Edm.Double">2.95</d:BC_30YEAR>
    <d:BC_30YEARDISPLAY m:type="Edm.Double">2.95</d:BC_30YEARDISPLAY>
  </m:properties>
 </content>
</entry>
<entry>
<id></id>
<title type="text"></title>
<updated>2014-12-03T07:44:30Z</updated>
<author>
  <name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6236)" />
<category />
<content type="application/xml">
  <m:properties>
    <d:Id m:type="Edm.Int32">6236</d:Id>
    <d:NEW_DATE m:type="Edm.DateTime">2014-12-02T00:00:00</d:NEW_DATE>
    <d:BC_1MONTH m:type="Edm.Double">0.04</d:BC_1MONTH>
    <d:BC_3MONTH m:type="Edm.Double">0.03</d:BC_3MONTH>
    <d:BC_6MONTH m:type="Edm.Double">0.08</d:BC_6MONTH>
    <d:BC_1YEAR m:type="Edm.Double">0.14</d:BC_1YEAR>
    <d:BC_2YEAR m:type="Edm.Double">0.55</d:BC_2YEAR>
    <d:BC_3YEAR m:type="Edm.Double">0.96</d:BC_3YEAR>
    <d:BC_5YEAR m:type="Edm.Double">1.59</d:BC_5YEAR>
    <d:BC_7YEAR m:type="Edm.Double">2</d:BC_7YEAR>
    <d:BC_10YEAR m:type="Edm.Double">2.28</d:BC_10YEAR>
    <d:BC_20YEAR m:type="Edm.Double">2.72</d:BC_20YEAR>
    <d:BC_30YEAR m:type="Edm.Double">3</d:BC_30YEAR>
    <d:BC_30YEARDISPLAY m:type="Edm.Double">3</d:BC_30YEARDISPLAY>
  </m:properties>
</content>
</entry>
</feed>

This XML document gets a new Entry appended each day for the duration of the month when it resets and starts again on the 1st of the next month.

I need to extract the date from d:NEW_DATE and the value from d:BC_10YEAR, now when there is just a single entry this is no problem, however I am struggling to work out how to have it go through the file and extracting the relevant date and value from each ENTRY block.

Any assistance is very much appreciated.

Peter Louw
  • 23
  • 3
  • I had to remove a bunch or URL links in the XML so that I could post it up. I don't believe them missing should have any impact on the solution though. – Peter Louw Dec 03 '14 at 12:17
  • http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python –  Dec 03 '14 at 12:25

1 Answers1

0

BeautifulSoup is probably the easiest way to do what you're looking for:

from BeautifulSoup import BeautifulSoup

xmldoc = open('datafile.xml', 'r').read()
bs = BeautifulSoup(xmldoc)

entryList = bs.findAll('entry')

for entry in entryList:
    print entry.content.find('m:properties').find('d:new_date').contents[0]
    print entry.content.find('m:properties').find('d:bc_10year').contents[0]

You can then replace the print with whatever you want to do with the data (add to a list etc.).

Jakob
  • 1,129
  • 9
  • 24
  • Thanks for this, I have been playing around a bit with BeautifulSoup so will definitely give this a go and let you know if it has worked. – Peter Louw Dec 03 '14 at 12:33
  • 1
    With a little tweaking of the above I have managed to get this working the way I needed. Many thanks! – Peter Louw Dec 03 '14 at 13:18
  • Not to worry, glad I could help :). Feel free to give the answer a +1 upvote for usefulness ;) – Jakob Dec 03 '14 at 15:16
  • As soon as I have a high enough reputation rating to up tick the answer I will! – Peter Louw Dec 05 '14 at 07:58