0

Working in python, my goal is to parse through an XML doc I made and create a nested list of lists in order to access them later and parse the feeds. The XML doc resembles the following snippet:

<?xml version="1.0'>
<sources>
    <!--Source List by Institution-->
    <sourceList source="cbc">
        <f>http://rss.cbc.ca/lineup/topstories.xml</f>
    </sourceList>
    <sourceList source="bbc">
        <f>http://feeds.bbci.co.uk/news/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/world/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/uk/rss.xml</f>
    </sourceList>
    <sourceList source="reuters">
        <f>http://feeds.reuters.com/reuters/topNews</f>
        <f>http://feeds.reuters.com/news/artsculture</f>
    </sourceList>
</sources>

I would like to have something like nested lists where the inner most list would be the content between the <f></f> tags and the list above that one would be created with the names of the sources ex. source="reuters" would be reuters. Retrieving the info from the XML doc isn't a problem and I'm doing it with elementtree with loops retrieving with node.get('source') etc. The problem is I'm having trouble generating the lists with the desired names and different lengths required from the different sources. I have tried appending but am unsure how to append to list with the names retrieved. Would a dictionary be better? What would be the best practice in this situation? And how might I make this work? If any more info is required just post a comment and I'll be sure to add it.

Stephan GM
  • 245
  • 3
  • 15
  • How do you want to use the lists? If looking up both source and feed by key, you'll want nested dictionaries. If looking up source by key but then walking through all feeds for the source, you'll want a dictionary of lists. Etc. – Peter Raynham Jul 29 '14 at 03:05

1 Answers1

0

From your description, a dictionary with keys according to the source name and values according to the feed lists might do the trick.

Here is one way to construct such a beast:

from lxml import etree
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source.xpath('./f')]
    for source in etree.parse('x.xml').xpath('/sources/sourceList')}

pprint(news_sources)

Another sample, without lxml or xpath:

import xml.etree.ElementTree as ET
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source]
    for source in ET.parse('x.xml').getroot()}

pprint(news_sources)

Finally, if you are allergic to list comprehensions:

import xml.etree.ElementTree as ET
from pprint import pprint

xml = ET.parse('x.xml')
root = xml.getroot()
news_sources = {}
for sourceList in root:
    sourceListName = sourceList.attrib['source']
    news_sources[sourceListName] = []
    for feed in sourceList:
       feedName = feed.text
       news_sources[sourceListName].append(feedName)

pprint(news_sources)
Robᵩ
  • 163,533
  • 20
  • 239
  • 308