How to crawl Goodreads and obtain data

Question

I'm an avid reader and someone interested in coding. As all you readers know that searching for the next book to read is a rather ritual and process of its own. I want to do a small little thing towards the same.

What I want to do is crawl over all the pages of Goodreads and extract those books that satisfy the following criteria.

Have more than 20,000 reviews
Has more than 4 star rating
Its 1 star and 2 star ratings should be less than 2% each
Its 3 star rating should be less than 20%

I'm decent with python and know a little bit of Beautiful Soup. Equipped with these tools can someone please guide me how to proceed with my quest?

Thank you!

Your question as it is now appears too broad to get decent answers, you already mentioned beautiful soup, which is an excellent tool. Could you meaybe try something on your own and get back here with a more specific question? — toti08, Dec 04 '18 at 09:27
If I'm looking at a single page, like a link for a single book, I can do this. But how can I do this for all the books that exist on the Goodreads. I wouldn't even have the links for all the books on the website, to just iterate over, right? — RAVI, Dec 04 '18 at 09:58
Solution to access multiple links in https://stackoverflow.com/questions/40629457/scrape-multiple-urls-using-beautiful-soup. Instead I would advise to use the Goodreads API to access its data. https://www.hongkiat.com/blog/goodreads-ratings-api/ — Swati Singh, Apr 12 '19 at 05:22

How to crawl Goodreads and obtain data

0 Answers0