1

I want to use the urllib module to send HTTP requests and grab data. I can get the data by using the urlopen() function, but not really sure how to incorporate it into classes. I really need help with the query class to move forward. From the query I need to pull • Top Rated • Top Favorites • Most Viewed • Most Recent • Most Discussed

My issue is, I can't parse the XML document to retrieve this data. I also don't know how to use classes to do it.

Here is what I have so far:

import urllib #this allows the programm to sen HTTP requests and to read the responses.

class Query: 
    '''performs the actual HTTP requests and initial parsing to build the Video-
    objects from the response.  It will also calculate the following information
    based on the video and user results.  '''

    def __init__(self, feed_id, max_results): 
        '''Takes as input the type of query (feed_id) and the maximum number of 
        results (max_results) that the query should obtain. The correct HTTP 
        request must be constructed and submitted. The results are converted 
        into Video objects, which are stored within this class.
        '''

        self.feed = feed_id
        self.max = max_results


        top_rated = urllib.urlopen("http://gdata.youtube.com/feeds/api/standardfeeds/top_rated")
        results_str = top_rated.read()
        splittedlist = results_str.split('<entry')
        top_rated.close()


    def __str__(self):
        ''' prints out the information on each video and Youtube user. '''
        pass


class Video:
    pass


class User:
    pass



#main function:  This handles all the user inputs and stuff.
def main():
    useinput = raw_input('''Welcome to the YouTube text-based query application.
You can select a popular feed to perform a query on and view statistical 
information about the related videos and users.

1) today
2) this week
3) this month 
4) since youtube started

Please select a time(or 'Q' to quit):''')
    secondinput = raw_input("\n1) Top Rated\n2) Top Favorited\n3) Most Viewed\n4) Most Recent\n5) Most     Discussed\n\nPlease select a feed (or 'Q' to quit):")
    thirdinput = raw_input("Enter the maximum number of results to obtain:")

main()

toplist = []
top_rated = urllib.urlopen("http://gdata.youtube.com/feeds/api/standardfeeds/top_rated")
result_str = top_rated.read()
top_rated.close()
splittedlist = result_str.split('<entry')
results_str = top_rated.read()



x=splittedlist[1].find('title')#find the title index
splittedlist[1][x: x+75]#string around the title (/ marks the end of the title)
w=splittedlist[1][x: x+75].find(">")#gives you the start index
z=splittedlist[1][x: x+75].find("<")#gives you the end index
titles = splittedlist[1][x: x+75][w+1:z]#gives you the title!!!!
toplist.append(titles)
print toplist
ArtisanSamosa
  • 847
  • 2
  • 10
  • 23
  • You should make your question more specific. "I've got coder's block" is probably not a good SO question. What kind of queries do you want to make? How are they related? What do you want your classes to represent? – millimoose Nov 29 '11 at 23:39
  • I updated my question to be more specific. Each class basically borrows the data obtained in the Queries class. How do I use that class to parse the data at the points that I mentioned. The url that I provided under top_rated provides the query for all the top rated videos. That can be edited to specific times and number of videos. – ArtisanSamosa Nov 30 '11 at 00:21

2 Answers2

0

I assume that your challenge is parsing XML.

results_str = top_rated.read()
splittedlist = results_str.split('<entry')

And I see you are using string functions to parse XML. Such functions based on finite automata (regular languages) are NOT suited for parsing context-free languages such as XML. Expect it to break very easily.

For more reasons, please refer RegEx match open tags except XHTML self-contained tags

Solution: consider using an XML parser like elementree. It comes with Python and allows you to browse the XML tree pythonically. http://effbot.org/zone/element-index.htm

Your may come up with code like:

import elementtree.ElementTree as ET
..
results_str = top_rated.read()
root = ET.fromstring(results_str)
for node in root:
    print node

I also don't know how to use classes to do it.

Don't be in a rush to create classes :-)

In the above example, you are importing a module, not importing a class and instantiating/initializing it, like you do for Java. Python has powerful primitive types (dictionaries, lists) and considers modules as objects: so (IMO) you can go easy on classes.

You use classes to organize stuff, not because your teacher has indoctrinated you "classes are good. Lets have lots of them".

Community
  • 1
  • 1
Jesvin Jose
  • 22,498
  • 32
  • 109
  • 202
  • I have posted how I get the name of the first video. We basically have to use the classes that I posted. How can I use the my parsing method to get the names of all the videos from the api. How can I get it to search for every 'title' that comes up, instead of stopping at the first. – ArtisanSamosa Nov 30 '11 at 22:19
0

Basically you want to use the Query class to communicate with the API.

def __init__(self, feed_id, max_results, time):
       qs = "http://gdata.youtube.com/feeds/api/standardfeeds/"+feed_id+"?max-    results="+str(max_results)+"&time=" + time
       self.feed_id = feed_id
       self.max_results = max_results
       wo = urllib.urlopen(qs)
       result_str = wo.read()
       wo.close()
ArtisanSamosa
  • 847
  • 2
  • 10
  • 23