Load More using Selenium on Webscraping

Question

I was trying to do webscraping on Reuters for nlp analysis and most of it is working, but I am unable to get the code to click the "load more" button for more news articles. Below is the code currently being used:

import csv
import time
import pprint
from datetime import datetime, timedelta
import requests
import nltk
nltk.download('vader_lexicon')
from urllib.request import urlopen
from bs4 import BeautifulSoup
from bs4.element import Tag

comp_name = 'Apple'
url = 'https://www.reuters.com/search/news?blob=' + comp_name + '&sortBy=date&dateRange=all'

res = requests.get(url.format(1))
soup = BeautifulSoup(res.text,"lxml")
for item in soup.find_all("h3",{"class":"search-result-title"}):
    s = str(item)
    article_addr = s.partition('a href="')[2].partition('">')[0]
    headline = s.partition('a href="')[2].partition('">')[2].partition('</a></h3>')[0]
    article_link = 'https://www.reuters.com' + article_addr

    try:
        resp = requests.get(article_addr)
    except Exception as e:
        try:
            resp = requests.get(article_link)
        except Exception as e:
            continue

    sauce = BeautifulSoup(resp.text,"lxml")
    dateTag = sauce.find("div",{"class":"ArticleHeader_date"})
    contentTag = sauce.find("div",{"class":"StandardArticleBody_body"})

    date = None
    title = None
    content = None

    if isinstance(dateTag,Tag):
        date = dateTag.get_text().partition('/')[0]
    if isinstance(contentTag,Tag):
        content = contentTag.get_text().strip()
    time.sleep(3)
    link_soup = BeautifulSoup(content)
    sentences = link_soup.findAll("p")
    print(date, headline, article_link)

from selenium import webdriver
from selenium.webdriver.common.keys import keys
import time

browser = webdriver.Safari()
browser.get('https://www.reuters.com/search/news?blob=' + comp_name + '&sortBy=date&dateRange=all')
try:
    element = WebDriverWait(browser, 3).until(EC.presence_of_element_located((By.ID,'Id_Of_Element')))
except TimeoutException: 
    print("Time out!")

^ There is online help available for code formatting when the editor is open. — halfer, Jan 06 '20 at 17:19
It depends how many times would you like to click.Just once or as long as the element is visible you would like to click? — KunduK, Jan 06 '20 at 18:03
Thanks for your response. Ideally we would want to click the "LOAD MORE RESULTS" button 10 times. — web_scraper123, Jan 06 '20 at 18:23

score 3 · Answer 1 · answered Jan 06 '20 at 19:51

To click the element with text as LOAD MORE RESULTS you need to induce WebDriverWait for the element_to_be_clickable() and you can use the following Locator Strategies:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')

comp_name = 'Apple'
driver.get('https://www.reuters.com/search/news?blob=' + comp_name + '&sortBy=date&dateRange=all')
while True:
    try:
        driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='search-result-more-txt']"))))
        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='search-result-more-txt']"))).click()
        print("LOAD MORE RESULTS button clicked")
    except TimeoutException:
        print("No more LOAD MORE RESULTS button to be clicked")
        break
driver.quit()

Console Output:

LOAD MORE RESULTS button clicked
LOAD MORE RESULTS button clicked
LOAD MORE RESULTS button clicked
.
.
No more LOAD MORE RESULTS button to be clicked

Reference

You can find a relevant detailed discussion in:

Clicking “More” button via selenium

So, how can we extract the headlines of the search result here? — aearslan, Jun 04 '21 at 10:35

score 0 · Accepted Answer · answered Jan 06 '20 at 18:50

0

To click on LOAD MORE RESULTS induce WebDriverWait() and element_to_be_clickable()

Use while loop and check the counter<11 to click on 10 times.

I have tested on Chrome since I don't have safari browser however it should work too.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

comp_name="Apple"
browser = webdriver.Chrome()
browser.get('https://www.reuters.com/search/news?blob=' + comp_name + '&sortBy=date&dateRange=all')

#Accept the trems button
WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#_evidon-banner-acceptbutton"))).click()
i=1
while i<11:
     try:
        element = WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.XPATH,"//div[@class='search-result-more-txt' and text()='LOAD MORE RESULTS']")))
        element.location_once_scrolled_into_view
        browser.execute_script("arguments[0].click();", element)
        print(i)
        i=i+1

     except TimeoutException:
            print("Time out!")

answered Jan 06 '20 at 18:50

KunduK

32,888
5
17
41

TimeoutException Traceback (most recent call last) in 9 10 #Accept the trems button > 11 WebDriverWait(browser,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#_evidon-banner-acceptbutton"))).click() 12 i=1 13 while i<11: 78 if time.time() > end_time: 79 break > 80 raise TimeoutException(message, screen, stacktrace) 81 82 def until_not(self, method, message=''): TimeoutException: Message: – web_scraper123 Jan 06 '20 at 19:13
Thank you for your reply, however I keep getting the error message above^ – web_scraper123 Jan 06 '20 at 19:14
I am not sure if you are getting that pop up on safari or not can you comment this line and check.i am getting that pop up when interacting with chrome hence added it you can comment that line of code. – KunduK Jan 06 '20 at 19:19
I'm also wondering if there is any advice to incorporate the results after the "load more results" button clicking into my previous code? Since the url wouldn't change after more results loaded, the following 2 lines still has the same content: res = requests.get(url.format(1)) soup = BeautifulSoup(res.text,"lxml") Could you please kindly suggest? – web_scraper123 Jan 07 '20 at 16:39

Load More using Selenium on Webscraping

2 Answers2

Reference