3

The Internet Movie Database does not allow scraping from their website, but does provide an FTP site with text files that can be downloaded and used for research: http://www.imdb.com/interfaces

How can I extract reviews and the corresponding ratings from this FTP server, preferably in Python?

Zach
  • 4,624
  • 13
  • 43
  • 60
  • Did you try downloading one of the text files and looking at it, or reading the documentation to see what it contains? If you do so, you can make an effort to parse the text yourself. When you do so, and run into problems, you can then explain the problem, post the code that you've written that isn't working as you expect, and ask a specific question related to that code, and we can try to help. Good luck. – Ken White Jul 18 '14 at 20:55
  • I can't actually find the files containing user reviews and ratings. I have heard that such data exists (e.g. http://www.cs.cornell.edu/people/pabo/movie-review-data/) but I have not been able to identify the source files on the FTP server. – Zach Jul 18 '14 at 20:57
  • The link you provided in your question explains that the files are in gzip (compressed) format, and clicking on any of the indicated ftp server links in a web browswer provides a list of the .gz files (clearly named, such as `ratings.gz`) that are available; every one of those files is available to download, either via your web browser, an ftp client, or your own code that does so. What exactly are you having trouble locating? – Ken White Jul 18 '14 at 21:38
  • These types of questions are always welcome on OpenData.SX. See, for example, http://opendata.stackexchange.com/q/1073/1511 – philshem Oct 27 '14 at 10:44

1 Answers1

4

Reviews are not distributed by IMDb in their plain text data files.

For all the other data, you can parse them and store them in a SQL database using IMDbPY, or look at it to see how to parse only the information that are relevant to you.

Davide Alberani
  • 1,061
  • 1
  • 18
  • 28