0

I am trying to pull the full HTML from ratemyprofessors.com however at the bottom of the page, there is a "Load More Ratings" button that allows you to see more comments.

I am using requests.get(url) and beautifulsoup, but that only gives the first 20 comments. Is there a way to have the page load all the comments before it returns?

Here is what I am currently doing that gives the top 20 comments, but not all of them.

    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    comments = []
    for j in soup.findAll('div', attrs={'class': 'Comments__StyledComments-dzzyvm-0 dEfjGB'}):
        comments.append(j.text)
  • Using the dev network console, track what request is being made to load more, and then make a similar request. – Sri Apr 18 '20 at 18:10

1 Answers1

0

BeautifulSoup is more of an HTML parser for static pages than renderer for more dynamic web apps.

You could achieve what you want using a headless browser via Selenium by rendering the full page and repeatedly clicking the more link until there is no more to load.

Example: Clicking on a link via selenium

Since you're already using Requests, another option that might work is Requests-HTML which also supports dynamic rendering By calling .html.render() on the response object.

Example: https://requests-html.kennethreitz.org/index.html#requests_html.HTML.render

Reference: Clicking link using beautifulsoup in python

Taylor D. Edmiston
  • 12,088
  • 6
  • 56
  • 76