0

The button I'm trying to press looks like:

<a data-hide="#mvc-paginate-acf46b3a1b68090c" data-append="true" data-container="#posts-container" class="hubmvc-ajax-get mvc-more btn btn-skel-generic" href="https://example.com/linear-box-load-more/?load_more=1&amp;pg=2&amp;limit=36&amp;offset=14&amp;additional_class=gems&amp;ajax_hook=next_page&amp;_wpnonce=8762751649&amp;start_pg=1">View More</a>

I am trying to press it a number of times before scraping using BeautifulSoup. I tried using webdriver from selenium, but apparently it is no longer supported according to this answer: Scrape page with "load more results" button

excelsior
  • 95
  • 1
  • 9
  • What is the site? You are probably better off using requests. Selenium should only be used as a last resort. – antfuentes87 Jun 19 '19 at 16:37
  • Are any of those class names unique to this button? If so, you could use `driver.find_element_by_class_name("mvc-more")` (or whatever class name is unique) – John Gordon Jun 19 '19 at 17:07
  • @antfuentes87 I believe the ajax in the above button can't be clicked with using requests based on other answers – excelsior Jun 19 '19 at 17:58
  • What? That statement does not make any sense? If you can provide the site, I can write you a example of what I am talking about. – antfuentes87 Jun 19 '19 at 18:08
  • @antfuentes87 the website I'm trying to scrape is: https://newsnetwork.mayoclinic.org/secondary-archive/ and the button I'm attempting to press is `view more` – excelsior Jun 19 '19 at 18:22

1 Answers1

2

I looked at the Network tab in the Chrome Developer Tools and noticed that the page made a get request when the button was clicked. The following code makes a get request to get the articles for that specific "page". In the params, change the pg number to whatever number you want. This worked when I tested it. The only problem might be that it doesn't get the html for all the articles, only the ones on that specific page. You could probably get the html of all the pages if you create a requests Session or loop through get requests of all the pages.


import requests

params = {
    'load_more': '1',
    'pg': '2',
    'limit': '36',
    'offset': '14',
    'additional_class': 'gems',
    'ajax_hook': 'next_page',
    '_wpnonce': '8762751649',
    'start_pg': '1',
    'hub_mvc_ajax': '1',
    'mvc_fastload': '3a0a558385',
}
next_url = "https://newsnetwork.mayoclinic.org/linear-box-load-more/"

next_page = requests.get(next_url, params=params)

print(next_page.text)
LuckyZakary
  • 1,171
  • 1
  • 7
  • 19