How to crawl IMDB? [Read more] button not being pressed

Question

I'm Extracting IMDB movie reviews.

There is a problem To bring up the movie reviews, The [read-more] button must be pressed.

But after the review is over, I don't know how to end this.

It is currently being handled in a 'polling' way. How can you handle this more intelligently?

when there is more to read:

enter image description here

when there is nothing more to read:

enter image description here

Thank you!

https://stackoverflow.com/questions/1966503/does-imdb-provide-an-api#7744369 — Smart Manoj, Jun 03 '19 at 05:18

score 0 · Accepted Answer · answered Jun 03 '19 at 05:49

If you are doing it in Python, you can use xpath to extract the xpath from html page Example of retrieving reviews is given below. You can use try except case so that if if there is no information in the page, loop will end. Look at below example, it might help you - -

reviews = driver.find_elements_by_xpath('//article[@itemprop = "review"]')
            for review in reviews:

                # Initialize an empty dictionary for each review
                review_dict = {}

                # Find xpaths of the fields desired as columns in future data frame
                # We use the try/except statements to account for the fact that the reviews are not required to have
                # all the fields listed below, and if a review does not have a certain field we wish to make the
                # corresponding field blank in that particular row, rather than quit upon receiving an error.
                try:
                    airline = review.find_element_by_xpath(
                        '//div[@class = "review-heading"]//h1[@itemprop = "name"]').text
                except:
                    airline = page
                try:
                    overall = review.find_element_by_xpath('.//span[@itemprop = "ratingValue"]').text
                except:
                    overall = ""

In same way, you can use xpath element for your IMDB case and use try except so that no error pops up if there is nothing to read.

Thank you!, but. As a limit, I think I should give a little [Time.sleep] , anyways Thanks — , Jun 03 '19 at 08:33
yes that you should give. otherwise you might miss information. — Nitesh Jindal, Jun 03 '19 at 17:08

How to crawl IMDB? [Read more] button not being pressed

1 Answers1