Finding information on a website without an external module

Question

I am creating a program in Python where you search up a tv show/movie, and from IMDb, it gives you:

The title, year, rating, age rating, and synopsis of the movie.

I want to use no external modules at all, only the ones that come with Python 3.4.

I know I will have to use urllib, but I do not know where to go from there.

How would I do this?

Why the arbitrary restrictions? What have you tried so far yourself? What do you know about HTML parsing, have you looked if IMDb offers an API perhaps? — Martijn Pieters, Mar 29 '14 at 16:48
[Does IMDB provide an API?](http://stackoverflow.com/q/1966503) lists several options where all you have to do is import the `json` module to handle the returned data. — Martijn Pieters, Mar 29 '14 at 16:49
I used [this](http://www.omdbapi.com/) and I ask the user to enter a movie name. Then I do `url = urllib.request.urlopen("http://www.omdbapi.com/?t="+title+"&r=XML")`, how would I extract the information from there? — rtharper, Mar 29 '14 at 17:10

score 1 · Answer 1 · answered Mar 29 '14 at 17:52

1

This is an example taken from here:

import json
from urllib.parse import quote
from urllib.request import urlopen

def search(title):
    API_URL = "http://www.omdbapi.com/?r=json&s=%s"
    title = title.encode("utf-8")
    url = API_URL % quote(title)
    data = urlopen(url).read().decode("utf-8")
    data = json.loads(data)
    if data.get("Response") == "False":
        print(data.get("Error", "Unknown error"))

    return data.get("Search", [])

Then you can do:

>>> search("Idiocracy")
[{'Year': '2006', 'imdbID': 'tt0387808', 'Title': 'Idiocracy'}]

answered Mar 29 '14 at 17:52

elyase

39,479
12
112
119

Wow, thanks. When I try to get the user to input the movie name, then I do `search(movieTitle)`, it does not load the information. Is there anyway to do that? Also, is there anyway to load the rating, etc? – rtharper Mar 29 '14 at 18:11
"it does not load the information"?? What information? do you get an error? – elyase Mar 29 '14 at 18:30
It doesn't print out `[{'Year': '2006', 'imdbID': 'tt0387808', 'Title': 'Idiocracy'}]` when I let the user input the movie name – rtharper Mar 29 '14 at 18:47
That suggests that there's a problem with your code to accept user input. – khagler Mar 30 '14 at 07:38

Adam · Answer 2 · 2014-03-30T07:22:38.260

It's maybe too complex but: I look at the webpage code. I look where the info I want is and then I extract the info.

    import urllib.request

def search(title):
    html = urllib.request.urlopen("http://www.imdb.com/find?q="+title).read().decode("utf-8")
    f=html.find("<td class=\"result_text\"> <a href=\"",0)+34
    openlink=""
    while html[f]!="\"":
        openlink+= html[f]
        f+=1
    html = urllib.request.urlopen("http://www.imdb.com"+openlink).read().decode("utf-8")
    f = html.find("<meta property='og:title' content=\"",0)+35
    titleyear=""
    while html[f] !="\"":
        titleyear+=html[f]
        f+=1

    f = html.find("title=\"Users rated this ",0)+24
    rating = ""
    while html[f] !="/":   
        rating+= html[f]
        f+=1

    f=html.find("<meta name=\"description\" content=\"",0)+34
    shortdescription = ""
    while html[f] !="\"":
        shortdescription+=html[f]
        f+=1
    print (titleyear,rating,shortdescription)
    return (titleyear,rating,shortdescription)
search("friends")

The number adding to f has to be just right, you count the lenght of the string you are searching, because find() returns you the position of the first letter in the string.

It looks bad, is there any other simpler way to do it?

Finding information on a website without an external module

2 Answers2