How to scrape movies information from the IMDB website?

Question

I am new with Python and trying to scrape IMDB. I am scraping a list of 250 top IMDB movies and want to get information on each unique website for example the length of each movie.

I already have a list of unique URLs. So, I want to loop over this list and for every URL in this list I want to retrieve the 'length' of that movie. Is this possible to do in one code?

for URL in urlofmovie:
    htmlsource = requests.get(URL)
    tree_url = html.fromstring(htmlsource)
    lengthofmovie = tree_url.xpath('//*[@class="subtext"]')

I expect that lengthofmovie will become a list of all the lengths of the movies. However, it already goes wrong at line 2: the htmlsource.

What is in urlofmovie? can you post the full code. What error are you getting? — Andrew Daly, May 13 '19 at 10:58
Possible duplicate of [Does IMDB provide an API?](https://stackoverflow.com/questions/1966503/does-imdb-provide-an-api) — Sayse, May 13 '19 at 11:03
"I expect that 'lengthofmovie' will become a list of all the lengths of the movies" => it will not - no language has mind-reading abilities, so if you want a list you have to use a list. — bruno desthuilliers, May 13 '19 at 11:08
"However, it already goes wrong for at line 2: the htmlsource." => that's a different question. Please post one question per problem. Also, when you have an error in your code, you're supposed to post the exact error message and the full traceback - but in this case, the error is very probably due to the fact that `requests.get` returns a `HTTPResponse` object, not a string. You want the response's `.text` attribute instead (cf `requests` doc). — bruno desthuilliers, May 13 '19 at 11:12
What if I suggest you a better way to do this? Here try this - https://pypi.org/project/IMDbPY/ — Underoos, May 13 '19 at 11:16
be sure to check the terms and conditions of sites before you decide you want to webscrape. IMBD Terms of Use states, `"Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below."` — chitown88, May 13 '19 at 15:35

andreygold · Answer 1 · 2019-05-13T11:21:43.840

To make it a list you should first create a list and then append each length to that list.

length_list = []
for URL in urlofmovie:
    htmlsource = requests.get(URL)
    tree_url = html.fromstring(htmlsource)
    length_list.append(tree_url.xpath('//*[@class="subtext"]'))

Small tip: Since you are new to Python I would suggest you to go over PEP8 conventions. Your variable naming can make your(and other developers) life easier. (urlofmovie -> urls_of_movies)

However, it already goes wrong for at line 2: the htmlsource.

Please provide the exception you are receiving.

How to scrape movies information from the IMDB website?

1 Answers1