Get full HTML for page with dynamic expanded containers with python

Question

I am trying to pull the full HTML from ratemyprofessors.com however at the bottom of the page, there is a "Load More Ratings" button that allows you to see more comments.

I am using requests.get(url) and beautifulsoup, but that only gives the first 20 comments. Is there a way to have the page load all the comments before it returns?

Here is what I am currently doing that gives the top 20 comments, but not all of them.

    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    comments = []
    for j in soup.findAll('div', attrs={'class': 'Comments__StyledComments-dzzyvm-0 dEfjGB'}):
        comments.append(j.text)

Using the dev network console, track what request is being made to load more, and then make a similar request. — Sri, Apr 18 '20 at 18:10

score 0 · Accepted Answer · answered Apr 18 '20 at 18:23

0

BeautifulSoup is more of an HTML parser for static pages than renderer for more dynamic web apps.

You could achieve what you want using a headless browser via Selenium by rendering the full page and repeatedly clicking the more link until there is no more to load.

Example: Clicking on a link via selenium

Since you're already using Requests, another option that might work is Requests-HTML which also supports dynamic rendering By calling .html.render() on the response object.

Example: https://requests-html.kennethreitz.org/index.html#requests_html.HTML.render

Reference: Clicking link using beautifulsoup in python

answered Apr 18 '20 at 18:23

Taylor D. Edmiston

12,088
6
56
76

1

This worked perfectly. I had to change around my code, but using selenium was what I needed to do. Thanks for your help! – David Hafen Apr 21 '20 at 22:34
@DavidHafen Glad to hear it. Feel free to vote and/or accept the answer if it helped solve your problem. – Taylor D. Edmiston Apr 23 '20 at 19:01

Get full HTML for page with dynamic expanded containers with python

1 Answers1