0

Trying to parse info from a xml file that's hosted on a site. I'm making a tv addon for xbmc and my issue is that the info is all on on page and i only want to parse in sections like all of season 1! Where it only shows Season 1 in one spot then all of the episode below it then on season 2. I'm not sure how to write it type of code to only pull up season 1 if in click on season 1! Below is what i got going on:

    if type == 'tv_seasons':
         match=re.compile('<Season no="(.+?)">').findall(content)
         for seasonnumber in match:                
             item_url = new_url
             item_title = 'Season ' + seasonnumber
             item_id = common.CreateIdFromString(title + ' ' + item_title)               
             self.AddContent(list, indexer, common.mode_Content, item_title, item_id, 'tv_episodes', url=item_url, name=name, season=seasonnumber)

     elif type == 'tv_episodes':
         from entertainment.net import Net
         net = Net()
         content2 = net.http_GET(url).content
         match=re.compile('<episode><epnum>.+?</epnum><seasonnum>(.+?)</seasonnum>.+?<link>(.+?)</link><title>(.+?)</title>').findall(content2)
         for item_v_id_2, link_url, item_title  in match:
             item_v_id_2 = str(int(item_v_id_2))
             item_url = link_url
             item_id = common.CreateIdFromString(name + '_season_' + season + '_episode_' + item_v_id_2)
             self.AddContent(list, indexer, common.mode_File_Hosts, item_title, item_id, type, url=item_url, name=name, season=season, episode=item_v_id_2)

So now i'm working with this but still not working out for me.

        tree2 = ET.parse(urllib.urlopen(url))
        root2 = tree2.getroot()
        seasonnum = root2.findall("Show/Episodelist/Season[@no='%s']/episode/seasonnum" % season)
        seasonnumtext = seasonnum.text
        title = root2.findall("Show/Episodelist/Season[@no='%s']/episode/title" % season)
        item_title = title.text
        item_v_id_2 = str(int(seasonnumtext))
        item_url = url
        item_id = common.CreateIdFromString(name + '_season_' + season + '_episode_' + item_v_id_2)
        self.AddContent(list, indexer, common.mode_File_Hosts, item_title, item_id, type, url=item_url, name=name, season=season, episode=item_v_id_2)
Mikewave
  • 23
  • 3
  • 4
    `re` is not really the best tool for xml. There are a few dedicated solutions for that https://wiki.python.org/moin/PythonXml – njzk2 Feb 19 '14 at 18:49
  • pls add the relevant piece of html to your question – Guy Gavriely Feb 19 '14 at 18:50
  • http://services.tvrage.com/myfeeds/search.php?key=ag6txjP0RH4m0c8sZk2j&show=black%20sails here is the xml file – Mikewave Feb 19 '14 at 19:00
  • This question will probably help: http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python – GermanK Feb 19 '14 at 19:07
  • If any of your ´re.compile´ need to match over linebreak you need to compile it with a suitable flag such at ´re.DOTALL´ for the dot to actually match the linebreak – deinonychusaur Feb 19 '14 at 19:34
  • seasonnum = root2.findall("Show/Episodelist/Season[@no='%s']/episode/seasonnum" % season) So i tried to do this but seems like it's not working! – Mikewave Feb 19 '14 at 21:15

1 Answers1

2

I would recommend using Python XML Parser. You can then traverse the XML tree in a similar manner to Python dictionaries and lists.

StephenH
  • 1,126
  • 11
  • 24