2

I'm an avid reader and someone interested in coding. As all you readers know that searching for the next book to read is a rather ritual and process of its own. I want to do a small little thing towards the same.

What I want to do is crawl over all the pages of Goodreads and extract those books that satisfy the following criteria.

  1. Have more than 20,000 reviews
  2. Has more than 4 star rating
  3. Its 1 star and 2 star ratings should be less than 2% each
  4. Its 3 star rating should be less than 20%

I'm decent with python and know a little bit of Beautiful Soup. Equipped with these tools can someone please guide me how to proceed with my quest?

Thank you!

RAVI
  • 53
  • 1
  • 6
  • 2
    What have you actually tried, so far? – Dragonthoughts Dec 04 '18 at 09:09
  • Your question as it is now appears too broad to get decent answers, you already mentioned beautiful soup, which is an excellent tool. Could you meaybe try something on your own and get back here with a more specific question? – toti08 Dec 04 '18 at 09:27
  • If I'm looking at a single page, like a link for a single book, I can do this. But how can I do this for all the books that exist on the Goodreads. I wouldn't even have the links for all the books on the website, to just iterate over, right? – RAVI Dec 04 '18 at 09:58
  • 1
    Solution to access multiple links in https://stackoverflow.com/questions/40629457/scrape-multiple-urls-using-beautiful-soup. Instead I would advise to use the Goodreads API to access its data. https://www.hongkiat.com/blog/goodreads-ratings-api/ – Swati Singh Apr 12 '19 at 05:22

0 Answers0